10,000 Matching Annotations
  1. Sep 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      This manuscript examines the individual and dual effects of CHIP and LOY in MI employing a cohort of ~460 individuals. CHIP is assessed by NGS and LOY is assessed by PCR. The threshold for CHIP is set at 2% (an arbitrary cutoff that is often used) and LOY at 9% (according to the Discussion text - this reviewer may have missed the section that describes why this threshold was employed). The investigation assessed whether LOY could modulate inflammation, atherosclerotic burden, or MI risk associated with CHIP. Neither CHIP nor LOY independently affected hsCRP, atherosclerotic burden, or MI incidence, nor did LOY presence diminish these outcomes in CHIP+ male subjects.

      This study represents the first dual analysis of CHIP and LOY on CVD outcomes. The results are largely negative, contradictory to other studies (many with much larger sample sizes). I would attribute the limitation of sample size as a major contributor to the negative data. While the negative data are suspect, the "positive" finding that LOY abolishes the prognostic significance of CHIP on MI is of interest (and consistent with what is understood from mechanistic studies).

      Overall, I enjoyed reading the paper, and it is of interest to the research community.

      However, I disagree with some of the authors' interpretations of the data.

      Generally, many conclusions on CHIP interpretation are based on the comparison of findings from very large datasets that have been evaluated by shallow NGS DNA sequencing. These studies lack sensitivity and accuracy, but this is counterbalanced by their very large sample sizes. Thus, they draw conclusions from the sickest individuals (ICD codes) with the largest clones (explaining the 10% VAF threshold). Here, the study has a well-phenotyped cohort, but as far as this reviewer can tell, the DNA sequencing is "shallow" NGS. Typically, to assess smaller datasets, investigators employ an error-correction method (DNA barcodes, duplex sequencing, etc.) for the sensitivity and accuracy of calling variants. Thus, the current study appears to suffer from this limitation (small sample sizes combined with NGS).

      We thank the reviewer for his/her positive and open comment. We acknowledge that we did not use error-corrected sequencing method for our study. However, we do not fully agree with the statement that our NGS sequencing technique is “shallow”.

      Considering our entire sequencing panel, we achieve a sequencing depth ≥100X and ≥300X for 100% [99%;100%] and 99% [99%;100%] of the targeted regions respectively. This corresponds to a median depth of 2111X [1578;2574] for all regions sequenced. When considering “CHIP genes”, the median depth is 2694X [1875;3785] for patients from the CHAth study and 3455X [2266;4885] for patients from the 3C study. More specifically, for DNMT3A and TET2 genes, the median depths of sequencing are 2531X [1818;3313] and 3710X [2444;4901] for patients from the CHAth and 3C study respectively. These values are far much higher than the 300X recommended for NGS sequencing by capture technology by the French National Institute of Cancer. Coupling this high depth of sequencing with our bioinformatic pipeline that uses 3 different variant callers, a manual curing for all variants by trained hematobiologists and a bioinformatic tool to estimate the background noise allow us to detect somatic mutation with a VAF of 1% with a high accuracy. Noteworthy, our accuracy in detecting mutations in leukemia-associated genes is tested twice a year as part of our quality control program organized by the French Group of Molecular Biologists in Hematology (GBMHM). We added the information about the depth of sequencing in the Supplementary Methods section.

      While the "negative" data from this study are inconclusive, the positive data (i.e. CHIP being prognostic for MI in the absence but not presence of MI) is of interest. Thus, the investigators may want to consider a shorter report that largely focuses on this finding.

      We thank the reviewer for his/her interest in this result. We also agree that it would be interesting to focus specifically on demonstrating the impact of mLOY in countering the cardiovascular risk associated with CHIP. We performed additional analysis to demonstrate that this effect was independent of age and cardiovascular risk factors and included this information in the results section.

      However, we believe that it is also of interest to show negative results that, although probably due to limitation in sample size, suggest that the cardiovascular risk associated with CHIP is not as strong and clinically pertinent as initially suggested. Of note, if CHIP really increase the risk of Myocardial Infarction in a significant manner, they would be more frequently detected in subjects who suffered from a MI compared to those who did not, which was not observed in our cohort. Moreover, we were able to determine that if CHIP increases the risk of MI, they do it to a much lesser extent (HR = 1.03 for CHIP) -than other established cardiovascular risk factors such as hypercholesterolemia or tobacco use HR = 1.47 and HR = 1.86 respectively in our cohort), which questions the pertinence of considering for CHIP in the management of patients with atherothrombosis. These data have been added in the Results and Discussion sections.

      We also believe that our study has the merit to assess directly the impact of CHIP on atheroma burden, which has been performed in only a limited number of studies in the context of coronary artery disease. This could not be possible by analyzing only male subjects in our cohort because it would further decrease the statistical power of our analyses.

      Reviewer #2 (Public Review):

      Summary: 

      The preprint by Fawaz et al. presents the findings of a study that aimed to assess the relationship between somatic mutations associated with clonal hematopoiesis (CHIP) and the prevalence of myocardial infarction (MI). The authors conducted targeted DNA sequencing analyses on samples from 149 MI patients and 297 non-MI controls from a separate cohort. Additionally, they investigated the impact of the loss of the Y chromosome (LOY), another somatic mutation frequently observed in clonally expanded blood cells. The results of the study primarily demonstrate no significant associations, as neither CHIP nor LOY were found to be correlated with an increased prevalence of MI. Of note, the null findings regarding CHIP are in conflict with several larger studies in the literature.

      Strengths:

      Overall, this is a useful research work on an emerging risk factor for cardiovascular disease (CVD). The use of a targeted sequencing approach is a strength, as it offers higher sensitivity than the whole exome sequencing approaches used in many previous studies.

      Weaknesses:

      Reporting null findings is definitely relevant in an emerging field such as the role of somatic mutations in cardiovascular disease. Nevertheless, the study suffers from severe limitations, which casts doubts on the authors' conclusions, as detailed below:

      (1) The small sample size of the study population is a critical limitation, particularly when reporting null findings that conflict (partly) with positive findings in much larger studies, totaling hundreds of thousands of individuals (e.g. Zekavat et al, Nature CVR 2023, Vlasschaert et al, Circulation 2023; Zhao et al, JAMA Cardio 2024). The authors claim that they have 90% power to detect an effect size of CHIP on MI comparable to that in a previous report (Jaiswal et al, NEJM 2017). However, the methodology used to estimate statistical power is not described.

      We thank the reviewer for his/her pertinent and constructive comments. We totally agree that our study presents a substantially smaller sample size as compared to the studies of Zekavat et al, Vlasschaert et al or Zhao et al.

      The CHAth study was designed as a prospective study (which is not frequent in CHIP reports) to demonstrate that, if CHIP increase the risk of MI, they would be detected more frequently in patients who suffered from a MI compared to those who did not. To achieve this, we defined eligibility criteria to have a rather high prevalence of CHIP and optimize the statistical power of a study based on a limited number of patients. We thus enrolled patients who suffered from a first MI after the age of 75 years. These patients had to be compared with subjects from the Three-City study who had 65 years or more at inclusion and did not present any cardiovascular event before inclusion.

      To determine the number of patients necessary to achieve our objective, we considered a CHIP prevalence of 20% in the general population after the age of 75 years, as estimated when we set up our study (Genovese et al, NEJM 2014, Jaiswal et al, NEJM 2014, Jaiswal et al, NEJM 2017). At this time the relative risk of MI associated with CHIP was shown to be 1.7, leading to an expected prevalence of CHIP of 37% in subjects who presented a MI. Based on these hypotheses, the recruitment of 112 patients in the CHAth would have been sufficient to detect a significant higher prevalence of CHIP in MI(+) patients compared to MI(-) subjects with a power of 0.90 at a type I error rate of 5%. These calculations were performed by the Research Methodology Support Unit of the University Hospital of Bordeaux. These data were added in the Supplementary Methods section to expose more clearly the design and objectives of the CHAth study.

      Finally, we recruited 149 patients in the CHAth study and compared them to 297 control subjects. Although recruiting more patients than initially needed, we observed a similar prevalence of CHIP between our 2 cohorts, suggesting that the cardiovascular risk associated with CHIP is lower than the 1.7 increased risk claimed in most publications related to CHIP in the cardiovascular field. We have to notice that our study was not designed to demonstrate the impact of CHIP on the occurrence of MI during follow-up, which could explain our negative results due to a limited number of patients as stated by the reviewers. This statement has been added in the Supplementary Methods section. However, performing such analysis allowed us to confirm that the risk of MI associated with CHIP was lower than 1.7 and lower than the one associated with hypercholesterolemia or smoking.

      We would like also to notice that the eligibility criteria for both CHAth and the Three-City study can have led to a selection bias, possibly contributing to the contradiction of our results with other studies. As stated before, in the CHAth study, only patients who experience a first MI after the age of 75 were enrolled. In the Three-City study, all subjects had 65 years or more at inclusion. On the contrary, most of the cohorts showing an association between CHIP and cardiovascular events were composed of younger subjects:

      -          Bioimage : median age 70 years (55-80 years)

      -          MDC : median age 60 years

      -          ATVB : subjects with a MI before 45 years

      -          PROMIS : subjects between 30 and 80 years

      -          UK Biobank : between 40 and 70 years at inclusion, median age of 58 years in the study of Vlasschaert et al.

      -          Zhao et al : median age of 53.83 years (45.35-62.39 years).

      This last information was added in the Discussion section (lines 452-454).

      Furthermore, the work by Jaiswal et al (NEJM 2017) showed a hazard ratio of approx. 2.0, but more recent work in much larger populations suggests that the overall effect of CHIP on atherosclerotic CVD is smaller, most likely due to the heterogeneity of effects of different mutated genes (e.g. Zekavat et al, Nature CVR 2023, Vlasschaert et al, Circulation 2023; Zhao et al, JAMA Cardio 2024).

      We thank the reviewer for insisting on the fact that the initial HR of 2.0 observed by Jaiswal et al was shown to be smaller in more recent studies. This corresponds to what we wrote in the introduction (lines 103-109) and discussion (lines 365-370, 465-471).

      In addition, several analyses in the current manuscript are conducted separately in MI(+) (n= 149) and MI(-) (N=297) individuals, further limiting statistical power. Power is still lower in the investigation of the effects of LOY and its interaction with CHIP, as only men are included in these analyses. Overall, I believe the study is severely underpowered, which calls into question the validity of the reported null findings.

      We agree with the reviewer that the statistical power of our study is lower than the one of other studies, in particular those based on several hundred thousand patients. Whenever possible, we analyzed our data by combining MI(+) and MI(-) subjects. However, for some aspects such as atherosclerosis, we did not have the same parameters available for these 2 groups and had to analyze them separately, leading to a more limited statistical power. We also have to acknowledge that our study was not designed to demonstrate an effect of CHIP on incident MI (as stated before), limiting our statistical power to demonstrate an effect of CHIP +/- mLOY on the incident risk of coronary artery disease.

      However, when designing our prospective study (CHAth study), we aimed to address the limitations of a small cohort and obtain rapid, significant results regarding the impact of CHIP. We hypothesized that if CHIP really increases the risk of myocardial infarction (MI), it would be detected more frequently in patients who have experienced a MI compared to those who have not. This study design would demonstrate the importance of CHIP in MI pathophysiology without requiring thousands of patients. However, we did not observe such an association questioning the relevance of detecting CHIP for the management of patients in the field of Cardiology. This was confirmed by the fact that in our cohort, the cardiovascular risk associated with CHIP appears to be low (HR = 1.03 [0.657;1.625] after adjustment on sex, age and cardiovascular risk factors) compared to hypercholesterolemia (HR = 1.474 [0.758;2.866]) or smoking (HR = 1.865 [0.943;3.690]). These data have been added in the Results and Discussion sections.

      In addition, we would like to mention that despite the limited number of subjects studied, we do not have only negative results. When studying only men subjects, we were able to show that CHIP accelerate the occurrence of MI, particularly in the absence of mLOY (Figure 2D). This effect was independent of age and cardiovascular risk factors (diabetes, cholesterol and high blood pressure). We added this last information in the results section of the manuscript, although we acknowledge that this has to be confirmed in future work.

      (2) Related to the above, it is widely accepted that the effects of CHIP on CVD are highly heterogeneous, as some mutated genes appear to have a strong impact on atherosclerosis, whereas the effect of others is negligible (e.g. Zekavat et al, Nature CVR 2023, Vlasschaert et al, Circulation 2023, among others). TET2 mutations are frequently considered a "positive control", given the multiple lines of evidence suggesting that these mutations confer a higher risk of atherosclerotic disease.

      However, no association with MI or related variables was found for TET2 mutations in the current work. Reporting the statistical power specifically for assessing the effect of TET2 mutations would enhance the interpretation of these results.

      We thank the reviewer for this pertinent remark. It has indeed been shown that depending on the somatic mutation, the impact of CHIP on inflammation, atherosclerosis and cardiovascular risk is different. The studies cited by the reviewer suggest that DNMT3A mutations have a low impact on atherosclerosis/atherothrombosis while other “non-DNMT3A” mutations, including TET2 mutations, have a greater impact. In particular, Zekavat et al suggested that TP53, PPM1D, ASXL1 and spliceosome mutations have a similar impact on atherosclerosis/atherothrombosis to TET2.

      To answer to the reviewer in our cohort, we did not find a clear association between the detection of TET2 mutation with a VAF≥2% and:

      -          A history of MI at inclusion (p=0.5339)

      -          Inflammation (p=0.440)

      -          Atherosclerosis burden :

      -   In the CHAth study:

      -  p=0.031 for stenosis≥50%

      -  p=0.442 fir multitruncular lesions

      -  p=0.241 for atheroma volume

      -   in the 3C study :

      -  p=0.792 for the presence of atheroma

      -  p=0.3966 for the number of plaques

      -  p=0.876 for intima-media thickness

      -          Incidence of MI (p=0.5993)

      Similarly we did not find any association between the detection of TET2 mutations with a VAF≥1% and:

      -          A history of MI at inclusion (p=0.5339)

      -          Inflammation (p=0.802)

      -          Atherosclerosis burden :

      -   In the CHAth study :

      -  p=0.104 for stenosis≥50%

      -  p=0.617 fir multitruncular lesions

      -  p=0.391 for atheroma volume

      -   in the 3c study:

      -  p=0.3291 for the presence of atheroma

      -  p=0.2060 for the number of plaques

      -  p=0.2300 for intima-media thickness

      -          Incidence of MI (p=0.195)

      However, analyzing the specific effect of TET2 mutations reduces the cohort of CHIP(+) subjects to 61 individuals. In these conditions, considering a prevalence of “TET2-CHIP” of 13.5% (in our cohort) and a hazard ratio of 1.3 (Vlasschaert et al), the statistical power to show an increased risk of MI is only 16%.

      (3) One of the most essential features of CHIP is the tight correlation with age. In this study, the effect of age on CHIP (Supplementary Tables S5, S6) seems substantially milder than in previous studies. Given the relatively weak association with age here, it is not surprising that no association with MI or atherosclerotic disease was found, considering that this association would have a much smaller effect size.

      We thank the reviewer for highlighting this point. Although the difference of median age between subjects with or without a CHIP is not very important in our cohort, we did observe a significant association of CHIP with age:

      -          The differences in age were statistically significant both in the CHAth and 3C study (Supplementary Tables S5 and S6)

      -          We observed a significant association between age and CHIP prevalence (p<0.001 for the total cohort, p=0.0197 for the CHAth study, and p=0.0394 for the 3C cohort after adjustment on sex). This association was already shown in the figure 1. We added the significant association between age and CHIP prevalence in the Results section (line 279).

      As stated before, we have to remind the reviewer that we enrolled only subjects of ≥75 years and ≥65 years in the CHAth and 3C studies respectively. This led to a median age in our cohort that was substantially higher than in other cohorts (in particular the UK Biobank and the different cohorts studied by Jaiswal et al). This could have contributed to an apparent milder effect of age on CHIP, even if this association was still observed.

      In addition, there are previous reports of sex-related differences in the prevalence of CHIP, is there an association between CHIP and age after adjusting for sex? 

      The reviewer correctly pointed out that sex has been associated with various aspects of CHIP. While Zekavat et al reported that CHIP carriers were more frequently males, Kar et al (Nature Genetics 2022), and Kamphuis et al (Hemasphere 2023) did not observe a difference in the prevalence of CHIP between males and females, but rather a difference in the mutational spectrum. Male presented more frequently SRSF2, ASXL1, SF3B1, U2AF1, JAK2, TP53 and PPM1D mutations while females had more frequently DNMT3A, CBL and GNB1 mutations.

      In our study, the association between CHIP prevalence and age was indeed significant even after adjustment on sex (p<0.001 for the total cohort, p=0.0197 for the CHAth study and p=0.0394 for the 3C).

      (4) The mutated genes included in the definition of "CHIP" here are markedly different than those in most previous studies, particularly when considering specifically the studies that demonstrated an association between CHIP and atherosclerotic CVD. For instance, the definition of CHIP in this manuscript includes genes such as ANKRD26, CALR, CCND2, and DDX41... that are not prototypical CHIP genes. This is unlikely to have a major impact on the main results, as the vast majority of mutations detected are indeed in bona fide CHIP genes, but it should be at least acknowledged.

      We agree with the reviewer that our gene panel includes genes that are not considered prototypical CHIP genes. This acknowledgment has been added in the Supplementary Methods section. To perform this study, we did not design a specific targeted sequencing panel. We used the one that is used for the diagnosis of myeloid malignancies at the University Hospital of Bordeaux. ANKRD26 and DDX41 are genes that, when mutated, predispose to the development of hematological malignancies. CALR mutations are frequently detected in Myeloproliferative Neoplasms while CCND2 mutation can be detected in acute myeloid leukemia among other diseases. As usually performed in our routine practice, we analyzed all the genes in the panel. However, as stated by the reviewer, most of the mutations we detected involved bona fide CHIP genes.

      Furthermore, the strategy used here for the CHIP variant calling and curation seems substantially different than that used in previous studies, which precludes a direct comparison. This is important because such differences in the definition of CHIP and the curation of variants are the basis of most conflicting findings in the literature regarding the effects of this condition. Ideally, the authors should conduct sensitivity analyses restricted to prototypical CHIP genes, using the criteria that have been previously established in the field (e.g. Vlasschaert et al, Blood 2023).

      We agree with the reviewer, our strategy for CHIP variant calling and curation was substantially different from what has been used in other studies. We decided to apply the criteria we used in previous studies for the analysis of somatic mutation in myeloid malignancies. Because CHIP are defined by the detection of “somatic mutations in leukemia driver genes”, this appeared to follow the definition of CHIP.

      We also acknowledge that this discrepancy with the criteria defined by Vlasschaert et al could contribute to our findings that differ from those of other studies. We thus checked whether the variants detected were in accordance or not with the criteria defined by Vlasschaert et al. Pooling the 2 cohorts, we detected 439 variants, 381 of which were in accordance with the criteria established by Vlasschaert et al, representing a concordance rate of 86.8%. Moreover, the variants “wrongly” retained according to these criteria had an impact on the conclusion on the detection of CHIP in only 15 patients (because these variants were associated with a mutation in a bona fide CHIP gene and/or because its VAF was below 2%). Thus, the impact of CHIP variant calling and curation had only a limited impact on our results. This has been added in the discussion (lines 455-459).

      However, we would like to discuss the criteria that have been defined by Vlasschaert et al which are probably too restrictive. For some genes, such as ZRSR2, in addition to frameshift and non-sens mutations that are expected to be associated with a loss of function, only some single nucleotide variations were retained (probably those detected by this group). In our patient 20785, we detected a c.524A>G, p.(Tyr175Cys) mutation that was not reported in the list published by Vlasscheart et al. However, this variant presents a VAF presumptive of a somatic origin (3%), affects the Zn finger domain of the protein and is observed in a male subject. Thus, it presents several criteria to consider it as associated with a loss of function. Similarly, the CBL variant c.1139T>C, p.(Leu380Pro) observed in our patient 21536, although not affecting the residues 381-421 of the protein (the criteria defined by Vlasschaert et al), has been reported in 29 cases of hematological malignancies. It is thus likely to have a significant impact on the behavior of hematopoietic cells. Moreover, in the same patient, a TET2 c.4534G>A, p.(Ala1512Thr) variant was detected. Although not affecting directly the CD1 domain, it has been reported in a case of AML with a VAF suggestive of a somatic origin (Papaemmanuil et al, NEJM 2016). The SH2B3 gene is not considered by Vlasschaert et al as a bona fide CHIP gene, contrary to other genes involved in cell signaling such as JAK2, GNAS, GNB1, CBL. However, inactivating mutations in SH2B3 can be detected in myeloid malignancies and were recently shown to drive the phenotype in some patients with a MPN (Zhang et al, American Journal of Hematology 2024). We could thus expect that this also happens in our patients 22591 and 21998 who harbor mutations of SH2B3 (a SNV in the PH domain and a frameshift mutation respectively).

      Regarding BCOR, STAG2, SMC3 and RAD21 genes, although frameshift mutations are the most prevalent, there are several reports on the existence of SNV in the context of hematological malignancies (COSMIC, Blood (2021) 138 (24): 2455–2468, Blood Cancer Journal (2023)13:18 ; https://doi.org/10.1038/s41408-023-00790-1).

      We can also add that although Vlasschaert et al did not consider CSF3R and CALR as CHIP-genes, Kessler et al did. Because CHIP are an emerging field, it should be considered that the concepts that define it are expected to evolve, as demonstrated by the recent study of the Jyoti Nangalia’s group (Bernstein et al, Nature Genetics 2024) who showed that 17 additional genes (including SH2B3) should be considered as driver of clonal hematopoiesis.

      (5) An important limitation of the current study is the cross-sectional design of most of the analyses. For instance, it is not surprising that no association is found between CHIP and prevalent atherosclerosis burden by ultrasound imaging, considering that many individuals may have developed atherosclerosis years or decades before the expansion of the mutant clones, limiting the possible effect of CHIP on atherosclerosis burden. Similarly, the analysis of the relationship between CHIP and a history of MI may be confounded by the potential effects of MI on the expansion of mutant clones. In this context, it is noteworthy that the only positive results here are found in the analysis of the relationship between CHIP at baseline and incident MI development over follow-up. Increasing the sample size for these longitudinal analyses would provide deeper insights into the relationship between CHIP and MI. 

      We agree with the reviewer that increasing the sample size for longitudinal analyses would provide deeper insights into the relationship between CHIP and MI. Unfortunately, for the moment, we do not have access to additional samples of the 3C study and are not able to perform these additional analyses.

      (6) The description of some analyses lacks detail, but it seems that statistical analyses were exclusively adjusted for age or age and sex. The lack of adjustment for conventional cardiovascular risk factors in statistical analyses may confound results, particularly given the marked differences in several variables observed between groups.

      The reviewer is right when saying that we adjusted our analyses on age and/or sex. This was done because as stated before, our results did not show a lot of significant differences. However, we reanalyzed our data, adjusting further the tests for conventional cardiovascular risk factors, and observed similar results. These data have been added in the results section (lines 286-287, 303, 319, 331-332, 341).

      (7) The variant allele fraction (VAF) threshold for identifying clinically relevant clonal hematopoiesis is still a subject of debate. The authors state that subjects without any detectable mutation or with mutations with a VAF below 2% were considered non-CHIP carriers. While this approach is frequent in the field, it likely misses many impactful mutations with lower VAFs. Such false negatives could contribute to the null findings reported here. Ideally, the authors should determine the lower detection limit of their sequencing approach (either computationally or through serial dilution experiments) and identify the threshold of VAF that can be detected reliably with their sequencing assay. The association between CHIP and MI should then be evaluated considering all mutations above this VAF threshold, in addition to sensitivity analyses with other thresholds frequent in the literature, such as 1% VAF, 2% VAF, and 10% VAF.

      We agree with the reviewer that the VAF threshold for identifying clinically relevant CH is still debated. As stated in the manuscript and by the reviewer, we used the conventional threshold of 2%. Considering that different studies have shown that the cardiovascular risk is increased in a more important manner for CHIP with a high VAF (Jaiswal et al, NEJM 2017, Kessler et al Nature 2022, Vlasschaert et al, Circulation 2023), it is not sure that considering variant with a very low VAF (below 2%) would help us in finding an impact of CHIP on inflammation, atherosclerosis or atherothrombotic risk.

      However, as mentioned by the reviewer, variants with a low VAF could have a clinical impact as recently reported by Zhao et al. In France, the use of biological analysis for medical purposes imposes to demonstrate that all its aspects are mastered, including their performances. In that context, we determined that our NGS strategy allowed us to reliably detect mutation with a VAF down to 1% (data not shown). As stated in the discussion, we also analyzed our results considering variants with a VAF of 1% and found similar results (lines 394-395). The sensitivity analyses were already mentioned in the manuscript, as we also searched for an effect of CHIP with a high VAF (≥5%) and found no effect neither. We did not have a sufficient number of subjects carrying variants with a VAF≥10% to perform analysis with this threshold.

      (8) The authors should justify the use of 3D vascular ultrasound imaging exclusively in the supra-aortic trunk. I am not familiar with this technique, but it seems to be most typically used to evaluate atherosclerosis burden in superficial vascular beds such as carotids or femorals. I am concerned about the potential impact of tissue depth on the accurate quantification of atherosclerosis burden in the current study (e.g. https://doi.org/10.1016/j.atherosclerosis.2016.03.002). It is unclear whether the carotids or femorals were imaged in the study population. 

      We apologize for the lack of precision in the Methods section. As stated by the reviewer, we evaluated the atherosclerosis burden in superficial vascular beds. We measured atheroma volume at the site of the common carotid (as described by B Lopez-Melgar, in Atheroslerosis, 2016). We did not analyze femoral arteries in this study. The sentence is now corrected in the Methods (lines 176-179).

      (9) The specific criteria used to define LOY need to be justified. LOY is stated to be defined based on a "A cut off of 9% of cells with mLOY defined the detection of a mLOY based on the study of 30 men of less than 40 years who had a normal karyotype as assessed by conventional cytogenetic study." As acknowledged by the authors, this definition of LOY is substantially different than that used in recent studies employing the same technique to detect LOY (Mas-Peiro et al, EHJ 2023). In addition, it seems essential to provide more detailed information on the ddPCR assay used to determine LOY, including the operating range and, more importantly, the lower limit of detection (%LOY) of the assay. A dilution series of a control DNA with no LOY would be helpful in this context. 

      We apologize if the definition of the threshold for detecting mLOY was unclear. To test the performance of our ddPCR technique, we first determined the background noise by testing DNA obtained from total leukocytes in 30 men of ≤40 years who presented a normal karyotype as assessed by conventional cytogenetic technics. In this control population supposed not to carry mLOY, we detected of proportion of cells with mLOY of 2,34+/-1,98 (see Author response image 1, panel A). We thus considered a threshold above 9% as being different from background noise (mean + 3 times the standard deviation).

      We then compared the proportion of cells with mLOY measured by ddPCR and conventional karyotype and observed a rather good correlation between the 2 technics (R2\=0.6430, p=0.0053, see Author response image 1, panel B). Finally, we tested the reliability of our ddPCR assay in detecting different levels of mLOY using a dilution series of control DNA (from an equivalent of 2% of cell with mLOY to 98% of cells with mLOY). We observed a very nice correlation between the theoretical and measured proportions of cells with mLOY (R2\=0.9989, p<0.001, see Author response image 1, panel C). Of note, the proportion of mLOY measured for values ≤10% were concordant with theoretical values. However, considering the background noise determined with control DNA, we were unable to confirm that this “signal” was different from the background noise. Therefore, we set a threshold of 9% to define the detection of mLOY by ddPCR. It is also noteworthy that the 10% cell population with mLOY was consistently detected by the ddPCR technique. This has been added in the Methods section (lines 228-235).

      Author response image 1.

      (10) Our understanding of the relationship between CHIP and CVD is evolving fast, and the manuscript should be considered in the context of recent literature in the field. For instance, the recent work by Zhao et al (JAMA Cardio 2024, doi:10.1001/jamacardio.2023.5095) should be considered, as it used a similar targeted DNA sequencing approach as the one used here, but found a clear association between CHIP and coronary heart disease (in a population of 6181 individuals). 

      We thank the reviewer for this pertinent reference. We did not include it in the first version of our manuscript because it was not published yet when we submitted our work. We included this reference in the discussion (lines 451, 455, 464). We also included the recent study of Heimlich et al (Circ Gen Pre Med 2024, lines 464-468) who studied the association of CHIP with atherosclerosis burden.

      (11) The use of subjective terms like "comprehensive" or "thorough" in the title of the manuscript does not align with the objective nature of scientific reporting. 

      We removed the terms “comprehensive” and “thorough” from the title and the text.

      Recommendations for the authors:

      Reviewing Editor:

      The Editors believe that in light of the small study the word Comprehensive has to be removed (including from the title and abstract).

      We agree and removed the term comprehensive from the title and the text.

      Reviewer #1 (Recommendations For The Authors):

      Other comments:

      It has long been recognized that hsCRP does not adequately address the inflammation associated with CHIP. For example, see Bick et al Nature 2020; 586:763. Through an assessment of a large dataset, the regulation of multiple inflammatory mediators was associated with CHIP but not with CRP. 

      We agree that hsCRP is probably not the most sensitive marker for inflammatory state associated with CHIP. However, it is the most commonly used one in medical practise. However, as indicated in the discussion (lines 418-420), we did not observe any association between CHIP and the plasmatic level of different cytokines (IL1ß, IL6, IL18 and TNFα) in patients enrolled in the CHAth study.

      Many of the citations lack journal names, volumes, page numbers, etc. 

      We apologize for this and corrected the citations.

      Please provide more details on the methodology (i.e. is CHIP assessed only through NGS with no error correction?). Specify the rationale for why the 9% LOY threshold was employed. Provide this information in the Methods section.

      We added more details on the methodology as demanded in the results section (lines 212-214 and 228-235).

      Supplementary Table S3 lacks headings. What are the designations for columns 6-8? 

      We apologize for this and corrected the Table. Columns 6-8 correspond to the VAF, coverage of the variants and depth of sequencing, as for Table S4.

    1. eLife assessment

      This important study describes the discovery of a mechanism by which multiple species of bacteria synthesize and localize polar flagella via a novel protein, FipA, which interacts with FlhF. The authors use appropriate methodological approaches (biochemistry, molecular microbiology, quantitative microscopy, and bacterial genetics) to obtain and present convincing results and interpretations. This work will particularly interest those studying bacterial motility and bacterial cell biologists.

    2. Reviewer #1 (Public review):

      Summary:

      Bacteria exhibit species-specific numbers and localization patterns of flagella. How specificity in number and pattern is achieved is poorly understood but often depends on a soluble GTPase called FlhF. Here the authors take an unbiased protein-pulldown approach to identify a protein FipA in V. parahaemolyticus that interacts with FlhF. They show that FipA co-occurs with FlhF in the genomes of bacteria with polarly-localized flagella and study the role of FipA in three different bacteria: V. parahaemolyticus, S. purtefaciens, and P. putida. In each case, they show that FipA contributes to FlhF polar localization, flagellar assembly, flagellar patterning, and motility to different species-specific extents.

      Strengths:

      The authors perform a comprehensive analysis of FipA, including phenotyping of mutants, protein localization, localization dependence, and domains of FipA necessary for each. Moreover, they perform a time-series analysis indicating that FipA localizes to the cell pole likely prior to, or at least coincident with, flagellar assembly. They also show that the role of FipA appears to differ between organisms in detail but the overarching idea that it is a flagellar assembly/localization factor remains convincing.

      Weaknesses:

      For me the comparative analysis in the different organism was on balance, a weakness. By mixing the data for each of the organisms together, I found it difficult to read, and take away key points from the results. In its current form, the individual details seem to crowd out the model.

    3. Reviewer #2 (Public review):

      Summary:

      The authors identify a novel protein, FipA, which facilitates recruitment of FlhF to the membrane at the cell pole together with the known recruitment factor HupB. This finding is key to understanding the mechanism of polar localization. By comparing the role of FipA in polar flagellum assembly in three different species from Vibrio, Shewanella and Pseudomonas, they discover that, while FipA is required in all three systems, evolution has brought different nuances that open avenues for further discoveries.

      Strengths:

      The discovery of a novel factor for polar flagellum development. A significant contribution to our understanding of flagellar evolution. The solid nature and flow of the experimental work.

      Weaknesses:

      All my concerns have been addressed. I find no weaknesses. A nice, solid piece of work.

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigate how polar flagellation is achieved in gamma-proteobacteria. By probing for proteins that interact with the known flagellar placement factor FlhF, they uncover a new regulator (FipA) for flagellar assembly and polar positioning in three flagellated gamma-proteobacteria. They convincingly demonstrate that FipA interacts genetically and biochemically with previously known spatial regulators HubP and FlhF. FipA is a membrane protein with a cytoplasmic DUF2802 and it co-localizes to the flagellated pole with HubP and FlhF. The DUF2802 mediates the interaction between FipA and FlhF and this interaction is required for FipA function. FipA localization depends on HubP and FlhF.

      Strengths:

      The work is throughly executed, relying on bacterial genetics, cell biology and protein interaction studies. The analysis is deep, beginning with the discovery af a new and conserved factor, to the molecular dissection of the protein and probing localisation and interaction determinants. Finally, they show that these determinants are important for function and they perform these studies in parallel in three model systems.

      Weaknesses:

      Because some of the phenotypes and localisation dependencies differ somewhat between model systems, the comparison is challenging to the reader because it is sometimes not obvious what these differences mean and why they arise.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important research uses an elegant combination of protein-protein biochemistry, genetics, and microscopy to demonstrate that the novel bacterial protein FipA is required for polar flagella synthesis and binds to FlhF in multiple bacterial species. This manuscript is convincing, providing evidence for the early stages of flagellar synthesis at a cell pole; however, the protein biochemistry is incomplete and would benefit from additional rigorous experiments. This paper could be of significant interest to microbiologists studying bacterial motility, appendages, and cellular biology.

      We are very grateful for the very positive and helpful evaluation.

      Joint Public Review:

      Bacteria exhibit species-specific numbers and localization patterns of flagella. How specificity in number and pattern is achieved in Gamma-proteobacteria needs to be better understood but often depends on a soluble GTPase called FlhF. Here, the authors take an unbiased protein-pulldown approach with FlhF, resulting in identifying the protein FipA in V. parahaemolyticus. They convincingly demonstrate that FipA interacts genetically and biochemically with previously known spatial regulators HubP and FlhF. FipA is a membrane protein with a cytoplasmic DUF2802; it co-localizes to the flagellated pole with HubP and FlhF. The DUF2802 mediates the interaction between FipA and FlhF, and this interaction is required for FipA function. Altogether, the authors show that FipA likely facilitates the recruitment of FlhF to the membrane at the cell pole together with the known recruitment factor HupB. This finding is crucial in understanding the mechanism of polar localization. The authors show that FipA co-occurs with FlhF in the genomes of bacteria with polarly-localized flagella and study the role of FipA in three of these organisms: V. parahaemolyticus, S. purtefaciens, and P. putida. In each case, they show that FipA contributes to FlhF polar localization, flagellar assembly, flagellar patterning, and motility, though the details differ among the species. By comparing the role of FipA in polar flagellum assembly in three different species, they discover that, while FipA is required in all three systems, evolution has brought different nuances that open avenues for further discoveries.

      Strengths:

      The discovery of a novel factor for polar flagellum development. The solid nature and flow of the experimental work.

      The authors perform a comprehensive analysis of FipA, including phenotyping of mutants, protein localization, localization dependence, and domains of FipA necessary for each. Moreover, they perform a time-series analysis indicating that FipA localizes to the cell pole likely before, or at least coincident with, flagellar assembly. They also show that the role of FipA appears to differ between organisms in detail, but the overarching idea that it is a flagellar assembly/localization factor remains convincing.

      The work is well-executed, relying on bacterial genetics, cell biology, and protein interaction studies. The analysis is deep, beginning with discovering a new and conserved factor, then the molecular dissection of the protein, and finally, probing localization and interaction determinants. Finally, the authors show that these determinants are important for function; they perform these studies in parallel in three model systems.

      Weaknesses:

      The comparative analysis in the different organisms was on balance, a weakness. Mixing the data for the organisms together made the text difficult to read and took away key points from the results. The individual details crowded out the model in its current form. Indeed, because some of the phenotypes and localization dependencies differ between model systems, the comparison is challenging to the reader. The authors could more clearly state what these differences mean, why they arise, and (in the discussion) how they might relate to the organism's lifestyle.

      More experiments would be needed to fully analyze the effects of interacting proteins on individual protein stability; this absence slightly detracted from the conclusions.

      We have tried our best to improve the manuscript according to the insightful suggestions of the reviewers. Please find our answers to the raised issues below.

      Reviewer #1 (Recommendations For The Authors):

      We are very grateful to this reviewer for the very positive evaluation and the great suggestions to improve the manuscript.

      I think there is value to the comparative analysis but how to present it in such a way that the key similarities and differences stand out is the challenge. Perhaps a table that compares the three datasets is sufficient. Or tell the story of V. parahaemolyticus first to establish the model, followed by comparative analysis of the other two organisms highlighting differences and relegating similarities to supplemental?

      We agree that the our previous presentation of our comparative analysis made it very hard to follow the major findings and the general role(s) of FipA, and we are very grateful for the suggestions on how to improve this. We have decided to change the presentation as the reviewer recommended. We used V. parahaemolyticus as a ‚lead model‘ to describe the role of FipA, and we then compared the major findings to the other two species. We hope that the story is now easier to follow.

      This is not something that needs to be addressed in the text but I wanted to bring the protein SwrB to the authors' attention which may further expand FipA relevance. Bacillus subtilis uses FlhFG to somehow pattern flagella in a peritrichous arrangement and there are a number of striking similarities, in my opinion, between FipA and SwrB. The two proteins have very similar domain architecture/topology, both proteins promote flagellar assembly, and the genetic neighborhood/operon organization is uncannily similar. There are other more minor similarities dependent on the organism in this paper.

      Phillips, Kearns. 2021. Molecular and cell biological analysis of SwrB in Bacillus subtilis. J Bacteriol 203:e0022721

      Phillips, Kearns. 2015. Functional activation of the flagellar type III secretion export apparatus. PLoS Genet 11:e1005443.

      We thank this reviewer for pointing out these intriguing similarities. For this study we have decided to exclusively concentrate on polarly flagellated bacteria. FlhF und FlhG are also present in B. subtilis where they play a role in organizing flagellation, but we feel that this would be out of scope for this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      We would like to thank this reviewer for the very positive evaluation and for pointing out several issues to strengthen the story.

      Figure 3A data are problematic since everything is too small to visualize. Since these are functional GFP fusions (or mCherry for 2E data), why are they not presented in color?

      Again - why are color figures not used to help the reader in Fig 4A and 5F & 5G to confirm what is asserted?

      Again, it is difficult to see the images presented. It is asserted that FipA is recruited to the cell pole after cell division and before flagellum assembly, but one has to take their word for it.

      We fully agree that in some case the localization pattern is hard to see on the micrographs presented. We have, therefore, provided enlarged micrographs in the supplemental part which allow to better see the fluorescent foci within the cells. With respect to presentations in color – we found that this did not improve the visibility of localizations and therefore have decided to use the grayscale images.

      Here, what is missing are turnover assays. Do FipA, FlhF, and HubP all co-localize as complex or is the absence of one leading to the protein turnover of other partners? I think this needs to be sorted out before final conclusions can be made.

      Thanks for pointing out this important point. We have now provided western analysis which demonstrate that FipA and FlhF are produced and stable in the absence of the other partners (see Supplemental Figure 5). Stability of HubP as a general polar marker not only required for flagellation was not determined.

      Minor comments:

      Line 58: change "around" to "in timing with"

      Line 79: what "signal" is transferred from the C-ring to the MS-ring. Are they not fully connected such that rotation is the entire structure - C-ring-MS-ring-Rod-Hook-Filament. Is it not the change in the relationship to the stator complex where the signal is transferred?

      Line 85: change "counting" to "control of flagellar numbers per cell"

      Line 110: change "is (co-)responsible for recruiting" to "facilitates recruitment of"

      Thanks for pointing this out. We have adjusted the wording according to the reviewer’s suggestions.

      Given that motility phenotypes vary on individual plates (volumes and dryness vary), why in Figure 2C are the motility assays for fipA and flhF mutants of P. putida done on different plates?

      For better visualisation, we have rearranged the spreading halos for the figure. All strain spreading comparisons on soft agar were always conducted on the same plate due to the reasons this reviewer mentioned.

      Reviewer #3 (Recommendations For The Authors):

      We thank this reviewer for the very positive evalution and the great suggestions.

      One possibility is to describe first all the results relating to FipA in Vibrio and then add the result sections at the end to illustrate the differences between Vibrio and Shewanella, and then Vibrio and Pseudomonas. This may make it easier to follow for the reader.

      We agree that the our previous presentation of our comparative analysis made it very hard to follow the major findings and the general role(s) of FipA, and we are very grateful for the suggestions on how to improve this. We have decided to change the presentation as the reviewer recommended. We used V. parahaemolyticus as a ‚lead model‘ to describe the role of FipA, and we then compared the major findings to the other two species. We hope that the story is now easier to follow.

      I would have liked to see some TEM analysis of flagella in fipA/hubP double mutants strains and was also wondering if FipA/FlhF/HubP colocalization had been studied in E. coli when all proteins are expressed together, at least with two bearing fluorescent tags.

      Thanks for these great suggestions. In this study, we have concentrated on the localization of FlhF by FipA and HubP. HubP has multiple functions in the cell and may also affect flagellar synthesis to some extent in a species-specific fashion. Therefore, any findings would have to be discussed very carefully, so we have decided to leave that out for the time being.

      With respect to the FipA/HubP/FlhF production in a heterologous host such as E. coli, this has been partly done (without FipA) in a second parallel story (see reference to Dornes et al (2024) in this manuscript). Rebuilding larger parts of the system in a heterologous host is currently done in an independent study. Therefore, we have decided not to include this already here.

      From the Reviewing Editor:

      We are grateful for handling the fair reviewing process, for the positive evaluation and the helpful hints.

      The microscopy was inconsistent (DIC versus phase) for unclear reasons. Did using different microscopes impact the ability to acquire low-intensity fluorescence signals? Please add a sentence in the Methods section to clarify.

      We are sorry for this inconsistency. As the imaging was carried out by different labs (to some part before the projects were joined), the corresponding preferred microscopy settings were used. We have added an explaining sentence to the Methods section.

      Also, some subcellular fluorescence localizations were not visible in the selected images (e.g., Figures 3 and 5). The reader had to rely on the authors' statements and analyses. The conclusions could be more robust with fluorescence measurements across the cell body for a subset of cells. The authors could provide this data analysis in the Supplemental; this measurement would more clearly show an accumulation of fluorescence at the cell pole, particularly in low-intensity images.

      We fully agree that in some case the localization pattern is hard to see on the micrographs presented. Unfortunately, often the signal is not sufficiently strong to provied proper demographs. We have, therefore, provided enlarged micrographs in the supplemental part, which allow to better see the fluorescent foci within the cells.

    1. Author response:

      We sincerely thank the reviewers for their thoughtful, critical, and constructive comments, which will help us in further exploring the mechanisms by which LDH regulates glycolysis, the tricarboxylic acid cycle, and oxidative phosphorylation future studies. The following is our responses to the reviewers' comments.

      Reviewer #1 (Public Review):

      Summary:

      Zeng et al. have investigated the impact of inhibiting lactate dehydrogenase (LDH) on glycolysis and the tricarboxylic acid cycle. LDH is the terminal enzyme of aerobic glycolysis or fermentation that converts pyruvate and NADH to lactate and NAD+ and is essential for the fermentation pathway as it recycles NAD+ needed by upstream glyceraldehyde-3-phosphate dehydrogenase. As the authors point out in the introduction, multiple published reports have shown that inhibition of LDH in cancer cells typically leads to a switch from fermentative ATP production to respiratory ATP production (i.e., glucose uptake and lactate secretion are decreased, and oxygen consumption is increased). The presumed logic of this metabolic rearrangement is that when glycolytic ATP production is inhibited due to LDH inhibition, the cell switches to producing more ATP using respiration. This observation is similar to the well-established Crabtree and Pasteur effects, where cells switch between fermentation and respiration due to the availability of glucose and oxygen. Unexpectedly, the authors observed that inhibition of LDH led to inhibition of respiration and not activation as previously observed. The authors perform rigorous measurements of glycolysis and TCA cycle activity, demonstrating that under their experimental conditions, respiration is indeed inhibited. Given the large body of work reporting the opposite result, it is difficult to reconcile the reasons for the discrepancy. In this reviewer's opinion, a reason for the discrepancy may be that the authors performed their measurements 6 hours after inhibiting LDH. Six hours is a very long time for assessing the direct impact of a perturbation on metabolic pathway activity, which is regulated on a timescale of seconds to minutes. The observed effects are likely the result of a combination of many downstream responses that happen within 6 hours of inhibiting LDH that causes a large decrease in ATP production, inhibition of cell proliferation, and likely a range of stress responses, including gene expression changes.

      Strengths:

      The regulation of metabolic pathways is incompletely understood, and more research is needed, such as the one conducted here. The authors performed an impressive set of measurements of metabolite levels in response to inhibition of LDH using a combination of rigorous approaches.

      Weaknesses:

      Glycolysis, TCA cycle, and respiration are regulated on a timescale of seconds to minutes. The main weakness of this study is the long drug treatment time of 6 hours, which was chosen for all the experiments. In this reviewer's opinion, if the goal was to investigate the direct impact of LDH inhibition on glycolysis and the TCA cycle, most of the experiments should have been performed immediately after or within minutes of LDH inhibition. After 6 hours of inhibiting LDH and ATP production, cells undergo a whole range of responses, and most of the observed effects are likely indirect due to the many downstream effects of LDH and ATP production inhibition, such as decreased cell proliferation, decreased energy demand, activation of stress response pathways, etc.

      We appreciate the reviewer’s critical comments. The main argument is whether the inhibition of LDH induces a temporal perturbation in glycolysis, the TCA cycle, and OXPHOS, or if it leads to a shift to a new steady state. We argue that this shift represents a transition between two steady states; specifically, GNE-140 treatment drives metabolism from one steady state to another.

      Before conducting the experiment, we performed a time course experiment, measuring glucose consumption and lactate production in cells treated with GNE-140. The results demonstrated a very good linearity, indicating that the glycolytic rate remained constant—thus confirming that glycolysis was at steady state. Given the tight coupling between glycolysis, the TCA cycle, and OXPHOS, we infer that the TCA cycle and OXPHOS were also at steady state. However, this ‘infer’ requires further confirmation.

      Multiple published reports have shown that LDH inhibition in cancer cells causes a shift from fermentative ATP production to respiratory ATP production. This notion persists because it is often compared to the well-established Crabtree and Pasteur effects, where cells toggle between fermentation and respiration based on glucose and oxygen availability. However, in the Pasteur or Crabtree effects, the deprivation of oxygen—the terminal electron acceptor—drives the switch, which is fundamentally different from LDH inhibition.

      Reviewer #2 (Public Review):

      Summary:

      Zeng et al. investigated the role of LDH in determining the metabolic fate of pyruvate in HeLa and 4T1 cells. To do this, three broad perturbations were applied: knockout of two LDH isoforms (LDH-A and LDH-B), titration with a non-competitive LDH inhibitor (GNE-140), and exposure to either normoxic (21% O2) or hypoxic (1% O2) conditions. They show that knockout of either LDH isoform alone, though reducing both protein level and enzyme activity, has virtually no effect on either the incorporation of a stable 13C-label from a 13C6-glucose into any glycolytic or TCA cycle intermediate, nor on the measured intracellular concentrations of any glycolytic intermediate (Figure 2). The only apparent exception to this was the NADH/NAD+ ratio, measured as the ratio of F420/F480 emitted from a fluorescent tag (SoNar).

      The addition of a chemical inhibitor, on the other hand, did lead to changes in glycolytic flux, the concentrations of glycolytic intermediates, and in the NADH/NAD+ ratio (Figure 3). Notably, this was most evident in the LDH-B-knockout, in agreement with the increased sensitivity of LDH-A to GNE-140 (Figure 2). In the LDH-B-knockout, increasing concentrations of GNE-140 increased the NADH/NAD+ ratio, reduced glucose uptake, and lactate production, and led to an accumulation of glycolytic intermediates immediately upstream of GAPDH (GA3P, DHAP, and FBP) and a decrease in the product of GAPDH (3PG). They continue to show that this effect is even stronger in cells exposed to hypoxic conditions (Figure 4). They propose that a shift to thermodynamic unfavourability, initiated by an increased NADH/NAD+ ratio inhibiting GAPDH explains the cascade, calculating ΔG values that become progressively more endergonic at increasing inhibitor concentrations.

      Then - in two separate experiments - the authors track the incorporation of 13C into the intermediates of the TCA cycle from a 13C6-glucose and a 13C5-glutamine. They use the proportion of labelled intermediates as a proxy for how much pyruvate enters the TCA cycle (Figure 5). They conclude that the inhibition of LDH decreases fermentation, but also the TCA cycle and OXPHOS flux - and hence the flux of pyruvate to all of those pathways. Finally, they characterise the production of ATP from respiratory or fermentative routes, the concentration of a number of cofactors (ATP, ADP, AMP, NAD(P)H, NAD(P)+, and GSH/GSSG), the cell count, and cell viability under four conditions: with and without the highest inhibitor concentration, and at norm- and hypoxia. From this, they conclude that the inhibition of LDH inhibits the glycolysis, the TCA cycle, and OXPHOS simultaneously (Figure 7).

      Strengths:

      The authors present an impressively detailed set of measurements under a variety of conditions. It is clear that a huge effort was made to characterise the steady-state properties (metabolite concentrations, fluxes) as well as the partitioning of pyruvate between fermentation as opposed to the TCA cycle and OXPHOS.

      A couple of intermediary conclusions are well supported, with the hypothesis underlying the next measurement clearly following. For instance, the authors refer to literature reports that LDH activity is highly redundant in cancer cells (lines 108 - 144). They prove this point convincingly in Figure 1, showing that both the A- and B-isoforms of LDH can be knocked out without any noticeable changes in specific glucose consumption or lactate production flux, or, for that matter, in the rate at which any of the pathway intermediates are produced. Pyruvate incorporation into the TCA cycle and the oxygen consumption rate are also shown to be unaffected.

      They checked the specificity of the inhibitor and found good agreement between the inhibitory capacity of GNE-140 on the two isoforms of LDH and the glycolytic flux (lines 229 - 243). The authors also provide a logical interpretation of the first couple of consequences following LDH inhibition: an increased NADH/NAD+ ratio leading to the inhibition of GAPDH, causing upstream accumulations and downstream metabolite decreases (lines 348 - 355).

      Weaknesses:

      Despite the inarguable comprehensiveness of the data set, a number of conceptual shortcomings afflict the manuscript. First and foremost, reasoning is often not pursued to a logical conclusion. For instance, the accumulation of intermediates upstream of GAPDH is proffered as an explanation for the decreased flux through glycolysis. However, in Figure 3C it is clear that there is no accumulation of the intermediates upstream of PFK. It is unclear, therefore, how this traffic jam is propagated back to a decrease in glucose uptake. A possible explanation might lie with hexokinase and the decrease in ATP (and constant ADP) demonstrated in Figure 6B, but this link is not made.

      We appreciate the reviewer's critical comment. In Figure 3C, there is no accumulation of F6P or G6P, which are upstream of PFK1. This is because the PFK1-catalyzed reaction sets a significant thermodynamic barrier. Even with treatment using 30 μM GNE-140, the ∆GPFK1 (Gibbs free energy of the PFK1-catalyzed reaction) remains -9.455 kJ/mol (Figure 3D), indicating that the reaction is still far from thermodynamic equilibrium, thereby preventing the accumulation of F6P and G6P.

      We agree with the reviewer that hexokinase inhibition may play a role, this requires further investigation.

      The obvious link between the NADH/NAD+ ratio and pyruvate dehydrogenase (PDH) is also never addressed, a mechanism that might explain how the pyruvate incorporation into the TCA cycle is impaired by the inhibition of LDH (the observation with which they start their discussion, lines 511 - 514).

      We agree with the reviewer’s comment. In this study, we did not explore how the inhibition of LDH affects pyruvate incorporation into the TCA cycle. As this mechanism was not investigated, we have titled the study: "Elucidating the Kinetic and Thermodynamic Insights into the Regulation of Glycolysis by Lactate Dehydrogenase and Its Impact on the Tricarboxylic Acid Cycle and Oxidative Phosphorylation in Cancer Cells."

      It was furthermore puzzling how the ΔG, calculated with intracellular metabolite concentrations (Figures 3 and 4) could be endergonic (positive) for PGAM at all conditions (also normoxic and without inhibitor). This would mean that under the conditions assayed, glycolysis would never flow completely forward. How any lactate or pyruvate is produced from glucose, is then unexplained.

      This issue also concerned me during the study. However, given the high reproducibility of the data, we consider it is true, but requires explanation.

      The PGAM-catalyzed reaction is tightly linked to both upstream and downstream reactions in the glycolytic pathway. In glycolysis, three key reactions catalyzed by HK2, PFK1, and PK are highly exergonic, providing the driving force for the conversion of glucose to pyruvate. The other reactions, including the one catalyzed by PGAM, operate near thermodynamic equilibrium and primarily serve to equilibrate glycolytic intermediates rather than control the overall direction of glycolysis, as previously described by us (J Biol Chem. 2024 Aug 8;300(9):107648).

      The endergonic nature of the PGAM-catalyzed reaction does not prevent it from proceeding in the forward direction. Instead, the directionality of the pathway is dictated by the exergonic reaction of PFK1 upstream, which pushes the flux forward, and by PK downstream, which pulls the flux through the pathway. The combined effects of PFK1 and PK may account for the observed endergonic state of the PGAM reaction.

      However, if the PGAM-catalyzed reaction were isolated from the glycolytic pathway, it would tend toward equilibrium and never surpass it, as there would be no driving force to move the reaction forward.

      Finally, the interpretation of the label incorporation data is rather unconvincing. The authors observe an increasing labelled fraction of TCA cycle intermediates as a function of increasing inhibitor concentration. Strangely, they conclude that less labelled pyruvate enters the TCA cycle while simultaneously less labelled intermediates exit the TCA cycle pool, leading to increased labelling of this pool. The reasoning that they present for this (decreased m2 fraction as a function of DHE-140 concentration) is by no means a consistent or striking feature of their titration data and comes across as rather unconvincing. Yet they treat this anomaly as resolved in the discussion that follows.

      GNE-140 treatment increased the labeling of TCA cycle intermediates by [13C6]glucose but decreased the OXPHOS rate, we consider the conflicting results as an 'anomaly' that warrants further explanation. To address this, we analyzed the labeling pattern of TCA cycle intermediates using both [13C6]glucose and  [13C5]glutamine. Tracing the incorporation of glucose- and glutamine-derived carbons into the TCA cycle suggests that LDH inhibition leads to a reduced flux of glucose-derived acetyl-CoA into the TCA cycle, coupled with a decreased flux of glutamine-derived α-KG, and a reduction in the efflux of intermediates from the cycle. These results align with theoretical predictions. Under any condition, the reactions that distribute TCA cycle intermediates to other pathways must be balanced by those that replenish them. In the GNE-140 treatment group, the entry of glutamine-derived carbon into the TCA cycle was reduced, implying that glucose-derived carbon (as acetyl-CoA) entering the TCA cycle must also be reduced, or vice versa.

      This step-by-step investigation is detailed under the subheading "The Effect of LDHB KO and GNE-140 on the Contribution of Glucose Carbon to the TCA Cycle and OXPHOS" in the Results section in the manuscript.

      In the Discussion, we emphasize that caution should be exercised when interpreting isotope tracing data. In this study, treatment of cells with GNE-140 led to an increase labeling percentage of TCAC intermediates by [13C6]glucose (Figure 5A-E). However, this does not necessarily imply an increase in glucose carbon flux into TCAC; rather, it indicates a reduction in both the flux of glucose carbon into TCAC and the flux of intermediates leaving TCAC. When interpreting the data, multiple factors must be considered, including the carbon-13 labeling pattern of the intermediates (m1, m2, m3, ---) (Figure 5G-K), replenishment of intermediates by glutamine (Figure 5M-V), and mitochondrial oxygen consumption rate (Figure 5W). All these factors should be taken into account to derive a proper interpretation of the data. 

      Reviewer #3 (Public Review):

      Hu et al in their manuscript attempt to interrogate the interplay between glycolysis, TCA activity, and OXPHOS using LDHA/B knockouts as well as LDH-specific inhibitors. Before I discuss the specifics, I have a few issues with the overall manuscript. First of all, based on numerous previous studies it is well established that glycolysis inhibition or forcing pyruvate into the TCA cycle (studies with PDKs inhibitors) leads to upregulation of TCA cycle activity, and OXPHOS, activation of glutaminolysis, etc (in this work authors claim that lowered glycolysis leads to lower levels of TCA activity/OXPHOS). The authors in the current work completely ignore recent studies that suggest that lactate itself is an important signaling metabolite that can modulate metabolism (actual mechanistic insights were recently presented by at least two groups (Thompson, Chouchani labs). In addition, extensive effort was dedicated to understanding the crosstalk between glycolysis/TCA cycle/OXPHOS using metabolic models (Titov, Rabinowitz labs). I have several comments on how experiments were performed. In the Methods section, it is stated that both HeLa and 4T1 cells were grown in RPMI-1640 medium with regular serum - but under these conditions, pyruvate is certainly present in the medium - this can easily complicate/invalidate some findings presented in this manuscript. In LDH enzymatic assays as described with cell homogenates controls were not explained or presented (a lot of enzymes in the homogenate can react with NADH!). One of the major issues I have is that glycolytic intermediates were measured in multiple enzyme-coupled assays. Although one might think it is a good approach to have quantitative numbers for each metabolite, the way it was done is that cell homogenates (potentially with still traces of activity of multiple glycolytic enzymes) were incubated with various combinations of the SAME enzymes and substrates they were supposed to measure as a part of the enzyme-based cycling reaction. I would prefer to see a comparison between numbers obtained in enzyme-based assays with GC-MS/LC-MS experiments (using calibration curves for respective metabolites, of course). Correct measurements of these metabolites are crucial especially when thermodynamic parameters for respective reactions are calculated. Concentrations of multiple graphs (Figure 1g etc.) are in "mM", I do not think that this is correct.

      While the roles of lactate as a signaling metabolite and metabolic models are important areas of research, our work focuses on different aspects.

      It is true that cell homogenates contain many enzymes that use NAD as a hydride acceptor or NADH as a hydride donor. However, in our assay system, the substrates are pyruvate and NADH, meaning only enzymes that catalyze the conversion of pyruvate + NADH to NAD + lactate can utilize NADH. Other enzymes do not interfere with this reaction. Although some enzymes may also catalyze this reaction, their catalytic efficiency is markedly lower than that of LDH, ensuring the validity of this assay.

      Similarly, the assays for glycolytic intermediates are validated by the substrate specificity.

      We have developed an LC-MS methodology for some glycolytic intermediates, but the accuracy of quantification remains unsatisfactory due to inherent limitations of this methodology.

    2. eLife assessment

      This study presents an assessment of the effect of lactate dehydrogenase (LDH) inhibition on the activity of glycolysis and tricarboxylic acid cycle. The data were collected and analyzed using solid and validated methodology. This paper makes a useful contribution to the field as it considers a control analysis of LDH flux.

    3. Reviewer #1 (Public Review):

      Summary:

      Zeng et al. have investigated the impact of inhibiting lactate dehydrogenase (LDH) on glycolysis and the tricarboxylic acid cycle. LDH is the terminal enzyme of aerobic glycolysis or fermentation that converts pyruvate and NADH to lactate and NAD+ and is essential for the fermentation pathway as it recycles NAD+ needed by upstream glyceraldehyde-3-phosphate dehydrogenase. As the authors point out in the introduction, multiple published reports have shown that inhibition of LDH in cancer cells typically leads to a switch from fermentative ATP production to respiratory ATP production (i.e., glucose uptake and lactate secretion are decreased, and oxygen consumption is increased). The presumed logic of this metabolic rearrangement is that when glycolytic ATP production is inhibited due to LDH inhibition, the cell switches to producing more ATP using respiration. This observation is similar to the well-established Crabtree and Pasteur effects, where cells switch between fermentation and respiration due to the availability of glucose and oxygen. Unexpectedly, the authors observed that inhibition of LDH led to inhibition of respiration and not activation as previously observed. The authors perform rigorous measurements of glycolysis and TCA cycle activity, demonstrating that under their experimental conditions, respiration is indeed inhibited. Given the large body of work reporting the opposite result, it is difficult to reconcile the reasons for the discrepancy. In this reviewer's opinion, a reason for the discrepancy may be that the authors performed their measurements 6 hours after inhibiting LDH. Six hours is a very long time for assessing the direct impact of a perturbation on metabolic pathway activity, which is regulated on a timescale of seconds to minutes. The observed effects are likely the result of a combination of many downstream responses that happen within 6 hours of inhibiting LDH that causes a large decrease in ATP production, inhibition of cell proliferation, and likely a range of stress responses, including gene expression changes.

      Strengths:

      The regulation of metabolic pathways is incompletely understood, and more research is needed, such as the one conducted here. The authors performed an impressive set of measurements of metabolite levels in response to inhibition of LDH using a combination of rigorous approaches.

      Weaknesses:

      Glycolysis, TCA cycle, and respiration are regulated on a timescale of seconds to minutes. The main weakness of this study is the long drug treatment time of 6 hours, which was chosen for all the experiments. In this reviewer's opinion, if the goal was to investigate the direct impact of LDH inhibition on glycolysis and the TCA cycle, most of the experiments should have been performed immediately after or within minutes of LDH inhibition. After 6 hours of inhibiting LDH and ATP production, cells undergo a whole range of responses, and most of the observed effects are likely indirect due to the many downstream effects of LDH and ATP production inhibition, such as decreased cell proliferation, decreased energy demand, activation of stress response pathways, etc.

    4. Reviewer #2 (Public Review):

      Summary:

      Zeng et al. investigated the role of LDH in determining the metabolic fate of pyruvate in HeLa and 4T1 cells. To do this, three broad perturbations were applied: knockout of two LDH isoforms (LDH-A and LDH-B), titration with a non-competitive LDH inhibitor (GNE-140), and exposure to either normoxic (21% O2) or hypoxic (1% O2) conditions. They show that knockout of either LDH isoform alone, though reducing both protein level and enzyme activity, has virtually no effect on either the incorporation of a stable 13C-label from a 13C6-glucose into any glycolytic or TCA cycle intermediate, nor on the measured intracellular concentrations of any glycolytic intermediate (Figure 2). The only apparent exception to this was the NADH/NAD+ ratio, measured as the ratio of F420/F480 emitted from a fluorescent tag (SoNar).

      The addition of a chemical inhibitor, on the other hand, did lead to changes in glycolytic flux, the concentrations of glycolytic intermediates, and in the NADH/NAD+ ratio (Figure 3). Notably, this was most evident in the LDH-B-knockout, in agreement with the increased sensitivity of LDH-A to GNE-140 (Figure 2). In the LDH-B-knockout, increasing concentrations of GNE-140 increased the NADH/NAD+ ratio, reduced glucose uptake, and lactate production, and led to an accumulation of glycolytic intermediates immediately upstream of GAPDH (GA3P, DHAP, and FBP) and a decrease in the product of GAPDH (3PG). They continue to show that this effect is even stronger in cells exposed to hypoxic conditions (Figure 4). They propose that a shift to thermodynamic unfavourability, initiated by an increased NADH/NAD+ ratio inhibiting GAPDH explains the cascade, calculating ΔG values that become progressively more endergonic at increasing inhibitor concentrations.

      Then - in two separate experiments - the authors track the incorporation of 13C into the intermediates of the TCA cycle from a 13C6-glucose and a 13C5-glutamine. They use the proportion of labelled intermediates as a proxy for how much pyruvate enters the TCA cycle (Figure 5). They conclude that the inhibition of LDH decreases fermentation, but also the TCA cycle and OXPHOS flux - and hence the flux of pyruvate to all of those pathways. Finally, they characterise the production of ATP from respiratory or fermentative routes, the concentration of a number of cofactors (ATP, ADP, AMP, NAD(P)H, NAD(P)+, and GSH/GSSG), the cell count, and cell viability under four conditions: with and without the highest inhibitor concentration, and at norm- and hypoxia. From this, they conclude that the inhibition of LDH inhibits the glycolysis, the TCA cycle, and OXPHOS simultaneously (Figure 7).

      Strengths:

      The authors present an impressively detailed set of measurements under a variety of conditions. It is clear that a huge effort was made to characterise the steady-state properties (metabolite concentrations, fluxes) as well as the partitioning of pyruvate between fermentation as opposed to the TCA cycle and OXPHOS.

      A couple of intermediary conclusions are well supported, with the hypothesis underlying the next measurement clearly following. For instance, the authors refer to literature reports that LDH activity is highly redundant in cancer cells (lines 108 - 144). They prove this point convincingly in Figure 1, showing that both the A- and B-isoforms of LDH can be knocked out without any noticeable changes in specific glucose consumption or lactate production flux, or, for that matter, in the rate at which any of the pathway intermediates are produced. Pyruvate incorporation into the TCA cycle and the oxygen consumption rate are also shown to be unaffected.

      They checked the specificity of the inhibitor and found good agreement between the inhibitory capacity of GNE-140 on the two isoforms of LDH and the glycolytic flux (lines 229 - 243). The authors also provide a logical interpretation of the first couple of consequences following LDH inhibition: an increased NADH/NAD+ ratio leading to the inhibition of GAPDH, causing upstream accumulations and downstream metabolite decreases (lines 348 - 355).

      Weaknesses:

      Despite the inarguable comprehensiveness of the data set, a number of conceptual shortcomings afflict the manuscript. First and foremost, reasoning is often not pursued to a logical conclusion. For instance, the accumulation of intermediates upstream of GAPDH is proffered as an explanation for the decreased flux through glycolysis. However, in Figure 3C it is clear that there is no accumulation of the intermediates upstream of PFK. It is unclear, therefore, how this traffic jam is propagated back to a decrease in glucose uptake. A possible explanation might lie with hexokinase and the decrease in ATP (and constant ADP) demonstrated in Figure 6B, but this link is not made.

      The obvious link between the NADH/NAD+ ratio and pyruvate dehydrogenase (PDH) is also never addressed, a mechanism that might explain how the pyruvate incorporation into the TCA cycle is impaired by the inhibition of LDH (the observation with which they start their discussion, lines 511 - 514).

      It was furthermore puzzling how the ΔG, calculated with intracellular metabolite concentrations (Figures 3 and 4) could be endergonic (positive) for PGAM at all conditions (also normoxic and without inhibitor). This would mean that under the conditions assayed, glycolysis would never flow completely forward. How any lactate or pyruvate is produced from glucose, is then unexplained.

      Finally, the interpretation of the label incorporation data is rather unconvincing. The authors observe an increasing labelled fraction of TCA cycle intermediates as a function of increasing inhibitor concentration. Strangely, they conclude that less labelled pyruvate enters the TCA cycle while simultaneously less labelled intermediates exit the TCA cycle pool, leading to increased labelling of this pool. The reasoning that they present for this (decreased m2 fraction as a function of DHE-140 concentration) is by no means a consistent or striking feature of their titration data and comes across as rather unconvincing. Yet they treat this anomaly as resolved in the discussion that follows.

    5. Reviewer #3 (Public Review):

      Hu et al in their manuscript attempt to interrogate the interplay between glycolysis, TCA activity, and OXPHOS using LDHA/B knockouts as well as LDH-specific inhibitors. Before I discuss the specifics, I have a few issues with the overall manuscript. First of all, based on numerous previous studies it is well established that glycolysis inhibition or forcing pyruvate into the TCA cycle (studies with PDKs inhibitors) leads to upregulation of TCA cycle activity, and OXPHOS, activation of glutaminolysis, etc (in this work authors claim that lowered glycolysis leads to lower levels of TCA activity/OXPHOS). The authors in the current work completely ignore recent studies that suggest that lactate itself is an important signaling metabolite that can modulate metabolism (actual mechanistic insights were recently presented by at least two groups (Thompson, Chouchani labs). In addition, extensive effort was dedicated to understanding the crosstalk between glycolysis/TCA cycle/OXPHOS using metabolic models (Titov, Rabinowitz labs). I have several comments on how experiments were performed. In the Methods section, it is stated that both HeLa and 4T1 cells were grown in RPMI-1640 medium with regular serum - but under these conditions, pyruvate is certainly present in the medium - this can easily complicate/invalidate some findings presented in this manuscript. In LDH enzymatic assays as described with cell homogenates controls were not explained or presented (a lot of enzymes in the homogenate can react with NADH!). One of the major issues I have is that glycolytic intermediates were measured in multiple enzyme-coupled assays. Although one might think it is a good approach to have quantitative numbers for each metabolite, the way it was done is that cell homogenates (potentially with still traces of activity of multiple glycolytic enzymes) were incubated with various combinations of the SAME enzymes and substrates they were supposed to measure as a part of the enzyme-based cycling reaction. I would prefer to see a comparison between numbers obtained in enzyme-based assays with GC-MS/LC-MS experiments (using calibration curves for respective metabolites, of course). Correct measurements of these metabolites are crucial especially when thermodynamic parameters for respective reactions are calculated. Concentrations of multiple graphs (Figure 1g etc.) are in "mM", I do not think that this is correct.

    1. eLife assessment

      In this valuable work, Lodhiya et al. provide evidence that excessive ATP underlies the killing of the model organism Mycobacterium smegmatis by two mechanistically-distinct antibiotics. Clarification of the role(s) of reactive oxygen species and ADP, as well as discrepancies with existing literature, would strengthen the model proposed. The data are generally solid as the authors deploy multiple, orthogonal readouts and methods for manipulating reactive oxygen species and ATP. The work will be of interest to those studying antibiotic mechanisms of action.

    2. Reviewer #1 (Public review):

      Summary:

      Lodhiya et al. demonstrate that antibiotics with distinct mechanisms of action, norfloxacin, and streptomycin, cause similar metabolic dysfunction in the model organism Mycobacterium smegmatis. This includes enhanced flux through the TCA cycle and respiration as well as a build-up of reactive oxygen species (ROS) and ATP. Genetic and/or pharmacologic depression of ROS or ATP levels protect M. smegmatis from norfloxacin and streptomycin killing. Because ATP depression is protective, but in some cases does not depress ROS, the authors surmise that excessive ATP is the primary mechanism by which norfloxacin and streptomycin kill M. smegmatis. In general, the experiments are carefully executed; alternative hypotheses are discussed and considered; the data are contextualized within the existing literature. Clarification of the effect of 1) ROS depression on ATP levels and 2) ADP vs. ATP on divalent metal chelation would strengthen the paper, as would discussion of points of difference with the existing literature. The authors might also consider removing Figures 9 and 10A-B as they distract from the main point of the paper and appear to be the beginning of a new story rather than the end of the current one. Finally, statistics need some attention.

      Strengths:

      The authors tackle a problem that is both biologically interesting and medically impactful, namely, the mechanism of antibiotic-induced cell death.

      Experiments are carefully executed, for example, numerous dose- and time-dependency studies; multiple, orthogonal readouts for ROS; and several methods for pharmacological and genetic depletion of ATP.

      There has been a lot of excitement and controversy in the field, and the authors do a nice job of situating their work in this larger context.

      Inherent limitations to some of their approaches are acknowledged and discussed e.g., normalizing ATP levels to viable counts of bacteria.

      Weaknesses:

      The authors have shown that treatments that depress ATP do not necessarily repress ROS, and therefore conclude that ATP is the primary cause of norfloxacin and streptomycin lethality for M. smegmatis. Indeed, this is the most impactful claim of the paper. However, GSH and dipyridyl beautifully rescue viability. Do these and other ROS-repressing treatments impact ATP levels? If not, the authors should consider a more nuanced model and revise the title, abstract, and text accordingly.

      Does ADP chelate divalent metal ions to the same extent as ATP? If so, it is difficult to understand how conversion of ADP to ATP by ATP synthase would alter metal sequestration without concomitant burst in ADP levels.

      Some of the results in the paper diverge from what has been previously reported by some of the referenced literature. These discrepancies should be clarified.

    3. Reviewer #2 (Public review):

      Summary:

      The authors are trying to test the hypothesis that ATP bursts are the predominant driver of antibiotic lethality of Mycobacteria.

      Strengths:

      This reviewer has not identified any significant strengths of the paper in its current form.

      Weaknesses:

      A major weakness is that M. smegmatis has a doubling time of three hours and the authors are trying to conclude that their data would reflect the physiology of M. tuberculossi which has a doubling time of 24 hours. Moreover, the authors try to compare OD measurements with CFU counts and thus observe great variabilities.

      If the authors had evidence to support the conclusion that ATP burst is the predominant driver of antibiotic lethality in mycobacteria then this paper would be highly significant. However, with the way the paper is written, it is impossible to make this conclusion.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      Lodhiya et al. demonstrate that antibiotics with distinct mechanisms of action, norfloxacin, and streptomycin, cause similar metabolic dysfunction in the model organism Mycobacterium smegmatis. This includes enhanced flux through the TCA cycle and respiration as well as a build-up of reactive oxygen species (ROS) and ATP. Genetic and/or pharmacologic depression of ROS or ATP levels protect M. smegmatis from norfloxacin and streptomycin killing. Because ATP depression is protective, but in some cases does not depress ROS, the authors surmise that excessive ATP is the primary mechanism by which norfloxacin and streptomycin kill M. smegmatis. In general, the experiments are carefully executed; alternative hypotheses are discussed and considered; the data are contextualized within the existing literature. Clarification of the effect of 1) ROS depression on ATP levels and 2) ADP vs. ATP on divalent metal chelation would strengthen the paper, as would discussion of points of difference with the existing literature. The authors might also consider removing Figures 9 and 10A-B as they distract from the main point of the paper and appear to be the beginning of a new story rather than the end of the current one. Finally, statistics need some attention.

      Strengths:

      The authors tackle a problem that is both biologically interesting and medically impactful, namely, the mechanism of antibiotic-induced cell death.

      Experiments are carefully executed, for example, numerous dose- and time-dependency studies; multiple, orthogonal readouts for ROS; and several methods for pharmacological and genetic depletion of ATP.

      There has been a lot of excitement and controversy in the field, and the authors do a nice job of situating their work in this larger context.

      Inherent limitations to some of their approaches are acknowledged and discussed e.g., normalizing ATP levels to viable counts of bacteria.

      We sincerely thanks appreciate the reviewer’s encouraging feedback.

      Weaknesses:

      The authors have shown that treatments that depress ATP do not necessarily repress ROS, and therefore conclude that ATP is the primary cause of norfloxacin and streptomycin lethality for M. smegmatis. Indeed, this is the most impactful claim of the paper. However, GSH and dipyridyl beautifully rescue viability. Do these and other ROS-repressing treatments impact ATP levels? If not, the authors should consider a more nuanced model and revise the title, abstract, and text accordingly.

      We thank the reviewer for asking this question. In the revised version of the manuscript, we will include data on the impact of the antioxidant GSH on ATP levels.

      Does ADP chelate divalent metal ions to the same extent as ATP? If so, it is difficult to understand how conversion of ADP to ATP by ATP synthase would alter metal sequestration without concomitant burst in ADP levels.

      We sincerely thank the reviewer for raising this insightful question. Indeed, ADP and AMP can also form complexes with divalent metal ions; however, these complexes tend to be less stable. According to the existing literature, ATP-metal ion complexes exhibit a higher formation constant compared to ADP or AMP complexes. This has been attributed to the polyphosphate chain of ATP, which acts as an active site, forming a highly stable tridentate structure (Khan et al., 1962; Distefano et al., 1953). An antibiotic-induced increase in ATP levels, irrespective of any changes in ADP levels, could still result in the formation of more stable complexes with metal ions, potentially leading to metal ion depletion. Although recent studies indicate that antibiotic treatment stimulates purine biosynthesis (Lobritz MA et al., 2022; Yang JH et al., 2019), thereby imposing energy demands and enhancing ATP production, the possibility of a corresponding increase in total purine nucleotide levels (ADP+ATP) exist (is mentioned in discussion section). However, this hypothesis requires further investigation.

      Khan MMT, Martell AE. Metal Chelates of Adenosine Triphosphate. Journal of Physical Chemistry (US). 1962 Jan 1;Vol: 66(1):10–5

      Distefano v, Neuman wf. Calcium complexes of adenosinetriphosphate and adenosinediphosphate and their significance in calcification in vitro. Journal of Biological Chemistry. 1953 Feb 1;200(2):759–63

      Lobritz MA, Andrews IW, Braff D, Porter CBM, Gutierrez A, Furuta Y, et al. Increased energy demand from anabolic-catabolic processes drives β-lactam antibiotic lethality. Cell Chem Biol [Internet]. 2022 Feb 17.

      Yang JH, Wright SN, Hamblin M, McCloskey D, Alcantar MA, Schrübbers L, et al. A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action. Cell [Internet]. 2019 May 30

      Some of the results in the paper diverge from what has been previously reported by some of the referenced literature. These discrepancies should be clarified.

      We apologize for any confusion, but we are uncertain about the specific discrepancies the reviewer is referring. In the discussion section, we have addressed and analysed our results within the broader context of the existing literature, regardless of whether our findings align with or differ from previous studies.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to test the hypothesis that ATP bursts are the predominant driver of antibiotic lethality of Mycobacteria.

      Strengths:

      This reviewer has not identified any significant strengths of the paper in its current form.

      Weaknesses:

      A major weakness is that M. smegmatis has a doubling time of three hours and the authors are trying to conclude that their data would reflect the physiology of M. tuberculosis which has a doubling time of 24 hours. Moreover, the authors try to compare OD measurements with CFU counts and thus observe great variabilities.

      If the authors had evidence to support the conclusion that ATP burst is the predominant driver of antibiotic lethality in mycobacteria then this paper would be highly significant. However, with the way the paper is written, it is impossible to make this conclusion.

      We have identified this new mechanism of antibiotic action in Mycobacterium smegmatis and have also mentioned that whether and how much of this mechanism is true in other organism needs to be tested as argued extensively in the discussion section of the manuscript.

      We have always drawn inferences from the CFU counts as the OD600nm is never a reliable method as reported in all of our experiments.

    1. eLife assessment

      This valuable study discusses a hot topic in post-endoscopic retrograde cholangiopancreatography pancreatitis. The new score for predicting post-ERCP pancreatitis offers an idea about the risk of pancreatitis before the procedure. Although most scores depend on intraprocedural manoeuvres, such as the number of attempts to cannulate the papilla, this is a solid retrospective single-center study in one country. To be validated, this score should be done in many countries and on large numbers of patients, nevertheless, this paper should interest gastrointestinal endoscopists.

    2. Joint Public Review:

      Summary:

      This work provides a new general tool for predicting post-ERCP pancreatitis before the procedure depending on pancreatic calcification, female sex, intraductal papillary mucinous neoplasm, a native papilla of Vater, or the use of pancreatic duct procedures. Even though it is difficult for the endoscopist to predict before the procedure which case might have post-ERCP pancreatitis, this new model score can help with the maneuver and when the patient is at high risk of pancreatitis, sometimes can be deadly), so experienced endoscopists can do the procedure from the start. This paper provides a model for stratifying patients before the ERCP procedure into low, moderate, and high risk for pancreatitis. To be validated, this score should be done in many countries and on large numbers of patients. Risk factors can also be identified and added to the score to increase rank.

      Strengths:

      (1) One of the severe complications of endoscopic retrograde cholangiopancreatography procedure is pancreatitis, so investigators try all the time to find a score that can predict which patients will probably have pancreatitis after the procedure. Most scores depend on the intraprocedural maneuver. Some studies discuss the preprocedural score that can predict pancreatitis before the procure. This study discusses a new preprocedural score for post-ERCP pancreatitis.

      (2) Depending on this score that identifies low, moderate, and high-risk patients for post-pancreatitis, so from the start, experienced and well-trained endoscopists can do the procedure or can refer patients to tertiary hospitals or use interventional radiology or endoscopic retrograde cholangiopancreatography.

      (3) The number of patients in this study is sufficient to analyze data correctly.

      Weaknesses:

      (1) It is a single-country, retrospective study.

      (2) Many cases were excluded, so the score cannot be applied to those patients.

      (3) Many other studies, e.g., https://link.springer.com/article/10.1007/s00464-021-08491-1, https://pubmed.ncbi.nlm.nih.gov/36344369/, that have been published before discussing the same issue, so what is the new with this score?

      (4) The discussion section needs reformulation to express the study's aim and results.

      (5) Why did the authors select these items in their scoring system and did not add more variables?

    1. eLife assessment

      This important study combines multiple techniques to investigate how caspase activity regulates non-lethal caspase-dependent processes. Through a combination of various approaches, and the development of new techniques, the authors provide compelling evidence supporting the claim that Fas3G-overexpression promotes non-lethal caspase activation in olfactory receptor neurons.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Muramoto and colleagues have examined a mechanism by which the executioner caspase Drice is activated in a non-lethal context in Drosophila. The authors have comprehensively examined this in the Drosophila olfactory receptor neurons using sophisticated techniques. In particular, they had to engineer a new reporter by which non-lethal caspase activation could be detected. The authors conducted a proximity labeling experiment and identified Fasciclin 3 as a key protein in this context. While the removal of Fascilin 3 did not block non-lethal caspase activation (likely because of redundant mechanisms), its overexpression was sufficient to activate non-lethal caspase activation.

      Strengths:

      While non-lethal functions of caspases have been reported in several contexts, far less is known about the mechanisms by which caspases are activated in these non-lethal contexts. So, the topic is very timely. The overall detail of this work is impressive and the results for the most part are well-controlled and justified.

      Weaknesses:

      The behavioral results shown in Figure 6 need more explanation and clarification (more details below). As currently shown, the results of Figure 6 seem uninterpretable. Also, overall presentation of the Figures and description in legends can be improved.

    3. Reviewer #2 (Public review):

      In this study, the authors investigate the role of caspases in neuronal modulation through non-lethal activation. They analyze proximal proteins of executioner caspases using a variety of techniques, including TurboID and a newly developed monitoring system based on Gal4 manipulation, called MASCaT. They demonstrate that overexpression of Fas3G promotes the non-lethal activation of caspase Dronc in olfactory receptor neurons. In addition, they investigate the regulatory mechanisms of non-lethal function of caspase by performing a comprehensive analysis of proximal proteins of executioner caspase Drice. It is important to point out that the authors use an array of techniques from western blot to behavioral experiments and also that the generated several reagents, from fly lines to antibodies.

      This is an interesting work that would appeal to readers of multiple disciplines. As a whole these findings suggest that overexpression of Fas3G enhances a non-lethal caspase activation in ORNs, providing a novel experimental model that will allow for exploration of molecular processes that facilitate caspase activation without leading to cell death.

    1. eLife assessment

      This valuable study combines electrophysiology experiments and modeling to investigate the encoding of dynamic patterns of polarized light by identified neurons of the bumblebee central complex. The scientific question and methodology are compelling. However, the evidence supporting the authors' conclusions is incomplete without more comprehensive statistical analyses.

    2. Reviewer #1 (Public review):

      Summary:

      The authors of this valuable study use linearly polarized UV light rotating at different angular velocities to stimulate photoreceptors in bumblebees and study the response of TL3 neurons to polarized light. Previous work has typically used a single constant rotation velocity of the polarized light, while the authors of this study explore a range of constant rotational velocities spanning from 30deg/s to 1920deg/s. The authors also use linearly polarized UV light rotating at continuously varying velocities following the angular velocity of the head of a flying bumblebee. 

      Strengths:

      The authors investigate the neuronal responses of TL3 neurons to a variety of rotational velocities. This approach has the potential to reveal the neuronal response to dynamically changing stimuli experienced by the animal as it moves around its environment.

      The authors make good use of physiology and modeling to validate their hypotheses and findings.  If done right, this line of investigation has the potential to provide a very useful methodology for utilizing more complex stimuli in studies of the visual pathway and central complex than traditionally. 

      Weaknesses: 

      The attempt of the authors to use more naturalistic stimuli than previous studies is very important, but the stimulus they use, i.e. linearly polarized UV light projected on the whole dorsal rim of the animal's eyes, is very different from the circular pattern of UV light polarization coming through the sky. In particular, as a bumblebee turns under the sky, the light projected on each ommatidium of the dorsal rim area will not smoothly change like the rotating linearly polarized light used in the experiments. The authors need to discuss this and other limitations of their study. 

      The authors should also commend the light intensity confound common in polarized light setups as discussed by Reinhard Wolf et al, J. Comp. Physiol. 1980 and in the thesis of Peter Weir, California Institute of Technology, 2013. It is unclear whether the authors performed measurements to quantify the intensity pattern and if they took measures to compensate and make the polarized light intensity uniform. 

      The authors show that the neuronal responses of TL3 neurons depend on the recent history of the polarized light stimulus. They use as evidence, the different neuronal firing rates measured when arriving at the same polarization stimulus by following two different preceding stimulus sequences. It would have been worthwhile to investigate to what extent the difference in neuronal response is due to the history alone and to what extent it is due to spike timing stochasticity inherent in the neurons. According to the raster plots in Figure 2F, there is substantial stochasticity in the timing of the action potential firing events.

      The authors appear to base their delay calculations and analysis on the response of one single neuron (Figures 2 and 3) even though they have recorded the responses of several TL3 neurons. There is no reason for the authors not to use all neuron recordings in their calculations and analysis.

      Another concern is that while the authors make good use of modeling, like any model, the presented models only partially explain the observed phenomena. However, a discussion about the limitations of their model needs to be provided.  Actually, observing the discrepancies between the model's output and the intracellular recordings reveals what the model is missing. That is, careful consideration of the discrepancies would have led the authors to try adding some noise in their model, which would partially resolve the differences observed at the lower rotational speeds (see stars deviating from the fitted line in Figure 2A) and to consider that introducing an asymmetry between the post-stimulus inhibition and excitation time constants could result in a model not deviating as much at the higher rotation velocities during counter-clockwise rotation of the polarized light (see stars deviating from the fitted line in Figure 2A). 

      In the end, the authors use the observation that during saccades, the average activity in their model-with-history increases to claim that when the animal does not turn, it uses less neuronal activity and energy. This is not a convincing line of reasoning. To make a claim about energy efficiency, the authors must instead compare their model with alternatives and show that the neuronal activity of their model during straight flight is indeed lower than those alternative models. Note that such a comparison would be meaningful only if the alternative models compared against capture physiology equally well in all other respects. However, the evident deviations of the presented model from the physiology measurements and the short duration of the test stimulus used would make any such claims difficult to substantiate. 

      Finally, for most experiments, the models are stimulated with a single short yaw sequence lasting a few seconds to measure responses. Given the dependence of the model on history, using such a small sample, we cannot see how generalizable the observations are. The authors need to show that the same effect is produced using multiple different trajectories.

    3. Reviewer #2 (Public review):

      Summary:

      The compass network is a higher-order circuit in insects that integrates sensory cues, like the angle of polarized light, with self-motion information to estimate the animal's angular position in space. This paper by Rother et al. uses share electrode recordings to measure intracellular voltage activity from individual compass neurons while polarization patterns are presented to the bee. They present patterns that rotate with variable speed or simulate the sensory experience created by a flight trajectory. The authors discover that at low rotational speeds, TL neuron responses diverge from the tuning expected from a systematic synaptic delay, suggesting that recent experience (history) impacts TL responses. A population model of 180 TL neurons is then used to argue that having cells that are impacted by spiking history could be advantageous for estimating heading. The model activity showed an anticipation of polarization angle for rapid turns that followed prolonged straight flights or turns in the opposite direction. The model also had reduced spiking activity during translational straight flight.

      Strengths:

      One strength of this paper is that it focuses on a question that is underexplored in the field: How does the compass network handle the processing delay caused by multi-synaptic relay from the DRA to the sensory input neurons (TL) to the compass network why the insect is turning rapidly and thus sampling distinct polarization angles in rapid succession? Another strength is the fact that they were able to present neurons with both simulated naturalistic polarization patterns that could occur during flight and synthetic stimuli with a range of rotational velocities. This provides an important data set where these responses can be compared. Another strength is the exploration of how adding a history term to a model of a population of TL neurons can lead to the population coding of polarization angle to vary in how delayed it is from changes to the sensory stimuli. They find that angular coding is more anticipatory (shorter delay) following prolonged periods of fixating a single angle, such as what occurs during translation movement, or following turns in the opposite direction of the current turn.

      Weaknesses:

      A challenge for this experimental approach is the relatively low power for data sets in some of the experimental conditions. Low throughput is expected for this experimental approach, as intracellular recordings are a challenging and time-consuming method. A weakness of the manuscript in its current form is that the data from all cells that were able to be recorded is not always presented or quantified. For example, only a single neuron example is used to show the impact of history on preferred polarization and how this tuning varied with rotation velocity. This is also true for the claim that TL3 neurons exhibit post-inhibitory excitation and post-excitatory inhibition. Another concern is regarding the use of the term "spiking-history" as potentially confusing to readers who might assume this process is cell intrinsic. The authors presented data shows evidence of an effect of stimulus history on the responses of the neurons. However as the authors describe in the discussion this current data set does not distinguish between an effect that occurs in the recorded neurons (e.g. an effect of intrinsic excitability) vs adaptation elsewhere in the circuit or DRA photoreceptors. A final challenge for this approach, shared with other studies that measure neural responses from an insect fixed in place, is that it assumes that these TL neurons are purely sensory and that their response properties (or those upstream of them) do not change when the bee performs a motor action or maneuver. This caveat should be considered when interpreting these data, however these data still represent novel information and important progress in exploring this question.

    4. Reviewer #3 (Public review):

      This manuscript reports the temporal history dependence of central complex TL/ring neuron spiking activity to polarized light patterns. Using sharp recording in tethered bumblebees with synthetic and natural visual stimulation, the authors nicely measured activities to rotating polarized UV light, and made the interesting finding that spiking activity depends on not just current stimulus but also recent activity.

      (1) History dependence has been reported before in ring neurons in Drosophila (Sun et al., Nature Neuroscience, 2017; Shiozaki et al., Nature Neuroscience, 2017). While there are differences in the nature of the visual stimulation used, the basic phenomenology of temporal history dependence bears some resemblance. Where are the differences in the physiological properties of ring/TL neurons between different insect species in relevance to history dependence? What are the structural similarities and differences in the circuits that may help to explain history dependence? Just to name a few. To gain further insight into this question, the manuscript may benefit from putting the findings here into context.

      (2) Figure 3b serves as a critical evidence for history-dependence. However, it is unclear from this data if this is history dependence, or other physiological processes such as OFF response to sensory stimulation, or sensory adaptation. One way to test this is to examine whether such an effect can be detected after a delay period. For example, history dependence in fly ring neurons is mediated by delay period activity present for several seconds. This can be easily tested here as well.

      (3) The properties of the history dependence can be better characterized to help understand its nature. What are the statistical characteristics of post-stimulus inhibition to preferred AoP and post-stimulus excitation to anti-preferred AoP? What are the temporal dynamics of such an effect, e.g., how long does it take to return to baseline? Are the differences in these properties recorded across the TL neuron population? Is it possible to categorize these TL neurons based on these properties and morphology? These properties are important to under the physiological basis of such effect. The authors only presented two traces in Figure 3b, beautiful example traces, but without any further population data and statistical analysis.

      (4) A major point of the manuscript is energy efficiency via reduction of firing rate. However, the only evidence comes from simulation, and it seems to be a weak effect of 0.5 APs/s.

      (5) Another major point of the manuscript is "increases sensitivity for course deviations during straight flight". However, this again is supported by simulation only. To validate these claims, empirical support of behavioral experiments is highly desired. Otherwise, it is recommended to minimize emphasizing such behavioral predictions.

      (6) A substantial portion of the text emphasizes the importance of natural stimulation. While natural stimulation is indeed a desirable experimental approach, it is unclear if natural stimulation is exploited to its full in this manuscript. History dependence can be explored with synthetic stimulation.

      (7) A phenomenological model was used to account for the history effect, by assuming a linear integration process and a linear history effect. However, such an assumption is not adequately backed up by rigorous statistical analysis of experiment data or at least proper conceptual discussion.

      (8) Population responses, as in Figure 4, are based on strong assumptions of neuronal properties without clear experimental support, thus seeming to be quite a stretch.

      (9) There are interesting observations in simulation results from Figure 5; it would be nice to experimentally test at least some of these ideas.

      (10) "anticipate future head directions" seems to be quite a stretch to me without mechanistic explanations.

      (11) The visual stimulation design used can be improved and expanded. The synthetic stimulation used in Figure 1c follows a stereotyped order, according to angular velocities. As the focus of the manuscript is to probe the history effect and to test again the findings made with this stimulation, randomized stimulation should ideally be examined.

      (12) State dependence was observed in ring neurons in Drosophila (Sun et al., Nature Neuroscience, 2017) which might be related to ongoing neural activity and history dependence. While I realize that the animal is tethered, I was wondering if there was any signature of neural activity state dependence observed in this study.

    1. eLife assessment

      This study makes an important effort to observe and quantify synaptic integration in a large and active network of cultured neurons, using simultaneous patch-clamp and large-scale extracellular recordings. They developed a method to distinguish excitatory and inhibitory contributions, show compelling evidence that the subthreshold activity of these neurons is dominated by few presynaptic neurons. They provide convincing statistics about connectivity and network dynamics.

    2. Reviewer #1 (Public review):

      This is an important study to characterize cultured neuronal network dynamics, down to the combinations of individual excitatory and inhibitory inputs that result in spiking. The authors effectively combine high-density multi-electrode arrays with patch recordings and a convincing analysis to work out the contributions of multiple simultaneously active input neurons to postsynaptic activity.

      In this study the authors develop methods to interrogate cultured neuronal networks to learn about the contributions of multiple simultaneously active input neurons to postsynaptic activity. They then use these methods to ask how excitatory and inhibitory inputs combine to result in postsynaptic neuronal firing in a network context.

      The study uses a compelling combination of high-density multi-electrode array recordings with patch recordings. They make effective use of physiology techniques such as shifting the reversal potential of inhibitory inputs, and identifying inhibitory vs. excitatory neurons through their influence on other neurons, to tease apart the key parameters of synaptic connections. The method appears to work on rather low-density cultures so the size of the networks in the current study is in the low tens, and the number of synaptic inputs coming to each neuron is smaller than what would be encountered in vivo.

      The authors obtain a number of findings on the conditions in which the dynamics of excitatory and inhibitory inputs permit spiking, and the statistics of connectivity that result in this. This is of considerable interest, and clearly one would like to see how these findings map to larger networks, to non-cortical networks, and ideally to networks in-vivo. The suite of approaches discussed here could potentially serve as a basis for such further development.

      One of the challenges in doing such studies in a dish is that the network is simply ticking away without any neural or sensory context to work on, nor any clear idea of what its outputs might mean. Nevertheless, at a single-neuron level one expects that this system might provide a reasonable subset of the kinds of activity an individual cell might have to work on. In their response to earlier comments the authors have made useful comments on features of in-vivo network activity that are seen in culture. This could ideally be incorporated into the discussion.

    3. Reviewer #2 (Public Review):

      The authors had two aims in this study. First, to develop a tool that lets them quantify the synaptic strength and sign of upstream neurons in a large network of cultured neurons. Second, they aimed at disentangling the contributions of excitatory and inhibitory inputs to spike generation.

      For the quantification of synaptic currents, their methods allows them to quantify excitatory and inhibitory currents simultaneously, as the sign of the current is determined by the neuron identity in the high-density extracellular recording. They further made sure that their method works for nonstationary firing rates, and they did a simulation to characterize what kind of connections their analysis does not capture. They assume that dendritic integration is linear, which is reasonable for synaptic currents measured using voltage-clamp.

      As suggested in a previous review, they have partitioned the explained variance into frequency bands and are able to account for most of the variance in the 3-200Hz range of expected synaptic activity.

      For the contributions of excitation and inhibition to neuronal spiking, the authors found a clear reduction of inhibitory inputs and increase of excitation associated with spiking when averaging across many spikes. And interestingly, the inhibition shows a reversal right after a spike and the timescale is faster during higher network activity. These findings provide further support that their method is working. In the revised version the authors now also provide an analysis of which synaptic event is associated with postsynaptic spiking. The large datasets from this study are well-suited to examining these points.

      For the first part, the authors achieved their goal in developing a tool to study synaptic inputs driving subthreshold activity at the soma and characterizing such connections. For the second part, they found an effect of EPSCs on firing, and in the revision they have quantified its relevance.

      With the availability of Neuropixels probes, there is certainly use for their tool in in vivo applications, and their statistical analysis provides a reference for future studies.<br /> The relevance of excitatory and inhibitory currents on spiking has now been examined in the updated version of the manuscript.

      In the following, there is a suggestion on improving Figure 6. Many other suggestions for Fig 6 and 7 have been taken up in the revision and it is OK to consider this as future work:

      Figure 6B is useful, but could be done better: The autocovariance of a shotnoise process is a convolution of the autocovariance of the underlying point process and the autocovariance of the EPSC kernel. So one would want to separate those to obtain a better temporal resolution. But a shotnoise process has well defined peaks, and the time of these local maxima can be estimated quite precisely. Now if I would do a peak triggered average instead of the full convolution, I would do half of the deconvolution and obtain a temporally asymmetric curve of what is expected to happen around an EPSC. Importantly, one could directly see expected excitation after inhibition or expected inhibition after excitation, and this visualization could be much better and more intuitive compared to panel 6E.

      As a suggestion for further analysis, though I am well aware that this is likely beyond the scope of this manuscript, I'd suggest the following analysis:<br /> I would split the data into the high and low activity states. Then I would compute the average of E/(E+I) values for spikes. Assuming that spikes tend to happen for local maxima of E/(E+I) I would find local maxima for periods without spikes such that their average is equal to the value for actual spikes. Finally, I would test for a systematic difference in either excitation or inhibition.<br /> If there is no difference, you can make the claim that synaptic input does not guarantee a spike, and compare it to a global average of E/(E+I).

    4. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful for the many positive comments. Moreover, we appreciate the recommendations to improve the manuscript; particularly, the important discussion points raised by reviewer 1 and the comments made by reviewer 2 concerning an extended quantification of how near-spike input conductances vary across individual spikes. We have performed several new detailed analyses to address reviewer 2’s comments. In particular, we now provide for all relevant postsynaptic cells the complete distributions of the excitatory and inhibitory input conductance changes that occur right before and after postsynaptic spiking, and we provide corresponding distributions of non-spiking regions as a reference. We performed these analyses separately for different baseline activity levels. Our new results largely support our previous conclusions but provide a much more nuanced picture of the synaptic basis of spiking. To the best of our knowledge, this is the first time that parallel information on input excitation, inhibition and postsynaptic spiking is provided for individual neurons in a biological network. We would argue that our new results further support the fundamental notion that even a reductionist neuronal culture model can give rise to sophisticated network dynamics with spiking – at least partially – triggered by rapid input fluctuations, as predicted by theory. Moreover, it appears that changes in input inhibition are a key mechanism to regulate spiking during spontaneous recurrent network activity. It will be exciting to test whether this holds true for neural circuits in vivo.

      In the following section, we address the reviewers’ comments individually.

      Reviewer 1:

      In this study the authors develop methods to interrogate cultured neuronal networks to learn about the contributions of multiple simultaneously active input neurons to postsynaptic activity. They then use these methods to ask how excitatory and inhibitory inputs combine to result in postsynaptic neuronal firing in a network context.

      The study uses a compelling combination of high-density multi-electrode array recordings with patch recordings. They make ingenious use of physiology tricks such as shifting the reversal potential of inhibitory inputs, and identifying inhibitory vs. excitatory neurons through their influence on other neurons, to tease apart the key parameters of synaptic connections.

      We thank the reviewer for acknowledging our efforts to develop an approach to investigate the synaptic basis of spiking in biological neurons and for appreciating the technical challenges that needed to be overcome.

      The method doesn't have complete coverage of all neurons in the culture, and it appears to work on rather low-density cultures so the size of the networks in the current study is in the low tens.

      (1) It would be valuable to see the caveats associated with the small size of the networks examined here.

      (2) It would be also helpful if there were a section to discuss how this approach might scale up, and how better network coverage might be achieved.

      These are indeed very important points that we should have discussed in more detail. Maximizing the coverage of neurons is critical to our approach, as it determines the number of potential synaptic connections that can be tested. The number of cells that we seeded onto our HD-MEA chip was chosen to achieve monolayer neuronal cultures. As detailed in ‘Materials and Methods -> Electrode selection and long-term extracellular recording of network spiking’, the entire HD-MEA chip (all 26'400 electrodes) was scanned for activity at the beginning of each experiment, and electrodes that recorded spiking activity were subsequently selected. While it is possible that some individual neurons escape detection, since they were not directly adjacent to an electrode, we estimate that a large majority of the active neurons in the culture was covered by our electrode selection method. New generations of CMOS HD-MEAs developed in our laboratory and other groups feature higher electrode densities, larger recording areas, and larger sets of electrodes that can be simultaneously recorded from (e.g., DOI:

      10.1109/JSSC.2017.2686580 & 10.1038/s41467-020-18620-4). These features will substantially improve the coverage of the network and also allow for using larger neuronal networks. As suggested by reviewer 1, we added these points to the Discussion section of the revised manuscript.

      The authors obtain a number of findings on the conditions in which the dynamics of excitatory and inhibitory inputs permit spiking, and the statistics of connectivity that result in this. This is of considerable interest, and clearly one would like to see how these findings map to larger networks, to non-cortical networks, and ideally to networks in-vivo. The suite of approaches discussed here could potentially serve as a basis for such further development.

      (3) It would be useful for the authors to suggest such approaches.

      We are confident that our suite of approaches will open important avenues to study the E & I input basis of postsynaptic spiking in other circuits beyond the in vitro cortical networks studied here. In fact, CMOS HD-MEA probes have been successfully combined with patch clamping in vivo (DIO: 10.1101/370080) and, in principle, the strategies and software tools introduced in our study would be equally applicable in an in vivo context. However, currently available in vitro CMOS HD-MEAs still surpass their in vivo counterparts (e.g., Neuropixels probes) in terms of electrode count. Moreover, using in vitro neural networks enables easy access and better network coverage compared to in vivo conditions. These are the main reasons why we chose an in vitro network for our investigation. We added these points to the Discussion section of the revised manuscript.

      (4) The authors report a range of synaptic conductance waveforms in time. Not surprisingly, E and I look broadly different. Could the authors comment on the implications of differences in time-course of conductance profiles even within E (or I) synapses? Is this functional or is it an outcome of analysis uncertainty?

      We are grateful to the reviewer for raising this interesting point. On the one hand, the onsets of the synaptic conductance waveform estimates were strikingly different between E and I synapses (see Fig. 8D). Furthermore, the rise and decay phases of synaptic currents were distinct for E vs. I inputs (Fig. 4C). We think that these differences are not just due to analysis uncertainty because both these observations are consistent with previously described properties of E and I inputs: Synaptic GABAergic I currents are typically slower compared to Glutamatergic E currents with respect to both rising and decay phase (DOI: 10.1126/science.abj586). Moreover, the relatively small onset latencies for I inputs that we observed are consistent with the well-known local action of inhibition. This finding was also consistent with smaller PRE-POST distances and general differences in neurite characteristics of E compared to I cells (Fig. S2).

      One of the challenges in doing such studies in a dish is that the network is simply ticking away without any neural or sensory context to work on, nor any clear idea of what its outputs might mean. Nevertheless, at a single-neuron level one expects that this system might provide a reasonable subset of the kinds of activity an individual cell might have to work on.

      (5) Could the authors comment on what subsets of network activity is, and is not, likely to be seen in the culture?

      (6) Could they indicate what this would mean for the conclusions about E-I summation, if the in-vivo activity follows different dynamics?

      We agree that there are natural limitations to a reductionist model, such as a dissociated cell culture. One may argue that neuronal cultures bear some similarities with neural networks formed during early brain development, where network formation is primarily driven by intrinsic, self-organizational capabilities. While such a self-organization is likely constrained in a 2D culture, it has been shown that several important circuit mechanisms that are observed in vivo are preserved in 2D dissociated cultures. For example, dissociated neuronal cultures can maintain E-I balance and achieve active decorrelation (DOI: 10.1038/nn.4415). In addition, in terms of activity levels, the sequences of heightened and more quiescent network spiking bear similarities with cortical Up-Down state oscillations observed during slow-wave sleep. To what extent individual circuit connectivity motifs and more nuanced network dynamics, found in vivo, can be recapitulated in vitro, is still not clear. However, combining our and previous work (especially DOI: 10.1038/nn.4415), we believe that there is sufficient evidence to justify work such as ours. On the one hand, identifying in simple cell culture models features of network dynamics and microcircuits known (or predicted) to exist in vivo is a testimony of neuronal self-organizing capabilities. On the other hand, our in vitro results will allow for more directed testing of equivalent mechanisms in vivo.

      Reviewer 2:

      The authors had two aims in this study. First, to develop a tool that lets them quantify the synaptic strength and sign of upstream neurons in a large network of cultured neurons. Second, they aimed at disentangling the contributions of excitatory and inhibitory inputs to spike generation.

      For the quantification of synaptic currents, their methods allows them to quantify excitatory and inhibitory currents simultaneously, as the sign of the current is determined by the neuron identity in the high-density extracellular recording. They further made sure that their method works for nonstationary firing rates, and they did a simulation to characterize what kind of connections their analysis does not capture. They did not include the possibility of (dendritic) nonlinearities or gap junctions or any kind of homeostatic processes.

      Thank you for the concise summary of our aims and of the features of our method. Indeed, we did not model nonlinear synaptic interactions, short-term plasticity etc., as there is likely a spectrum of possible interaction rules. Importantly, non-linear synaptic interactions were reduced by performing synaptic measurements in voltage-clamp mode.

      We do not anticipate that this would impact our connectivity inference per se. However, the presence of a significant number of nonlinear events would imply that some deviations between reconstructed and measured patch current traces were to be expected even if all incoming monosynaptic connections were identified. In the future, it will be exciting to add to our current experimental protocol a simultaneous HD-MEA & patch-clamp recording, in which the membrane potential is measured in current-clamp mode. Following application of our synaptic input-mapping procedure, one could, in this way, directly assess input-sequence dependent non-linear synaptic integration during spontaneous network activity.

      I see a clear weakness in the way that they quantify their goodness of fit, as they only report the explained variance, while their data are quite nonstationary. It could help to partition the explained variance into frequency bands, to at least separate the effects of a bias in baseline, the (around 100 Hz) band of synaptic frequencies and whatever high-frequency observation noise there may be. Another weak point is their explanation of unexplained variance by potential activation of extrasynaptic receptors without providing evidence. Given that these cultures are not a tissue and diffusion should be really high, this idea could easily be tested by adding a tiny amount of glutamate to the culture media.

      As suggested by the reviewer, we have now partitioned the current traces into frequency bands and separately assessed the goodness-of-fit. We have updated Fig. 3C accordingly:

      The following sentence was added to the main text:

      “We separately compared slow baseline changes (< 3 Hz), fast synaptic activity (3 - 200 Hz) and putative high-frequency noise (> 200 Hz), yielding a median variance explained of approximately 60% in the 3 - 200 Hz range (Fig. 3C).”

      Importantly, the variance explained in the frequency range of synaptic activity remains high. We would also like to point out that, even if all synaptic input connections were identified, one would expect some deviations between measured and reconstructed current trace. This is because the reconstructed trace is based on average input current waveforms and in the measured trace there may be synaptic transmission failures.

      We agree that the offered explanation for unexplained variance by activation of extrasynaptic receptors is fairly speculative. As it was not a crucial discussion point, we have therefore removed the statement.

      For the contributions of excitation and inhibition to neuronal spiking, the authors found a clear reduction of inhibitory inputs and increase of excitation associated with spiking when averaging across many spikes. And interestingly, the inhibition shows a reversal right after a spike and the timescale is faster during higher network activity. While these findings are great and provide further support that their method is working, they stop at this exciting point where I would really have liked to see more detail.

      Thank you for acknowledging our main results concerning the synaptic basis of spiking. We attempted to integrate in one manuscript a suite of new approaches, in addition to the respective applications. We, therefore, tried to strike the appropriate level of detail in presenting our findings. With regard to our analyses of which synaptic input events regulate postsynaptic spiking, we agree with reviewer 2’s assessment that more detail concerning the variability across individual spikes would be helpful. In the following parts, we detail multiple new analyses that we have included in the revised manuscript to address reviewer 2’s comments.

      A concern, of course, is that the network bursts in cultures are quite stereotypical, and that might cause averages across many bursts to show strange behaviour. So what I am missing here is a reference or baseline or null hypothesis. How does it look when using inputs from neurons that are not connected? And then, it looks like the E/(E+I) curve has lots of peaks of similar amplitude (that could be quantified...), so why does the neuron spike where it does? If I would compare to the peak (of similar amplitude) right before or right after (as a reference) are there some systematic changes? Is maybe the inhibition merely defining some general scaffold where spikes can happen and the excitation causes the spike as spiking is more irregular?

      The averaged trace reveals a different timescale for high and low activity states. But does that reflect a superposition of EPSCs in a single trial or rather a different jittering of a single EPSC across trials? For answering this question, it would be good to know the variance (and whether/ how much it changes over time). Maybe not all spikes are preceded by a decrease in inhibition. Could you quantitify (correlate, scatterplot?) how exactly excitation and inhibition contributions relate for single postsynaptic spikes (or single postsynaptic non-spikes)? After all, this would be the kind of detail that requires the large amount of data that this study provides.

      First of all, we are very grateful for the reviewer’s thorough assessment of our work and for the many valuable suggestions to improve it. We are convinced that we have addressed with our new analyses and the updated manuscript all issues raised here. One of the main findings from our original manuscript was that a rapid and brief change in input conductance (and particularly a reduction in inhibition) is an important spike trigger/regulator. We followed the reviewer’s suggestion and now provide scatter plots and distributions of the pre- (and post-spike) changes in input excitation and inhibition for individual postsynaptic spikes. A quantification of the peaks in the noisy E/(E+I) traces was not always trivial, which is why we reasoned that an assessment of the respective E and I changes is better suited. Moreover, as an unbiased reference, we generated separately for each postsynaptic cell a corresponding distribution of changes in input conductance in non-spiking periods (using random time points). We included our new results and updated figures in our responses to the specific reviewer comments below.

      For the first part, the authors achieved their goal in developing a tool to study synaptic inputs driving subthreshold activity at the soma, and characterizing such connections. For the second part, they found an effect of EPSCs on firing, but they barely did any quantification of its relevance due to the lack of a reference.

      With the availability of Neuropixels probes, there is certainly use for their tool in in vivo applications, and their statistical analysis provides a reference for future studies.

      The relevance of excitatory and inhibitory currents on spiking remains to be seen in an updated version of the manuscript.

      Thank you. Please see our new analyses below. Our new findings are in agreement with the main conclusions of the original manuscript. We provide evidence that rapid pre-spike changes in input conductance are observed across most individual spikes and that these rapid changes occur significantly more often before measured spikes than in non-spiking periods.

      I feel that specifically Figures 6 and 7 lack relevant detail and a consistent representation that would allow the reader to establish links between the different panels. The analysis shows very detailed examples, but then jumps into analyses that show population averages over averaged responses, losing or ignoring the variability across trials. In addition, while their results themselves pass a statistical test, it is crucial to establish some measure of how relevant these results are. For that, I would really want to know how much spiking would actually be restricted by the constraints that would be posed by these results, i.e. would this be reflected in tiny changes in spiking probabilities, or are there times when spiking probabilities are necessarily high, or do we see times when we would almost certainly get a spike, but neurons can fire during other times as well.

      I would agree that a detailed, quantitative analysis of this question is beyond the scope of this paper, but a qualitative analysis is feasible and should be done.

      Please see our revised Figure 6. We have rearranged some of the original panels and removed one example of mean conductance profiles. Moreover, we removed a panel with analysis results based on mean conductances that is now obsolete, as more detailed analyses are provided (which are in agreement with the original findings). Analyses from panels (A-F) are mostly unchanged. Panels (G-J) show the new results.

      The following paragraphs, which were added to the main text of the revised manuscript, describe our new findings:

      “For a more nuanced picture of which synaptic events are associated with postsynaptic spiking, we next quantified the changes in input excitation and inhibition that preceded individual postsynaptic spikes. In our analysis, we first focused on periods with high synaptic input activity. As previously discussed, cortical neurons in vivo typically receive and integrate barrages of input activation, similar to the high-activity events that we observed here (e.g., the event depicted in Fig. 6A, right). In Fig. 6G/H, individual pre-spike changes in input conductance are shown for two example postsynaptic neurons (plots labeled ‘spiking’, right). To assess how specific these conductance changes were to spiking periods, we also quantified the changes in input conductance that occurred during non-spiking periods as a reference (we used random time points from high-activity events excluding time points adjacent to measured spike times; we upscaled the number of measured spikes by 10x; the respective plots were labeled ‘non-spiking’). Spikes of both example neurons exhibited – compared to non-spiking regions – significantly more often a pre-spike decrease in inhibition, consistent with the mean conductance profiles. Precisely how an increase (top-right quadrants in Fig. 6G/H) or decrease (bottom-left quadrants) in both I and E conductance influenced the neuronal membrane potential is difficult to predict. However, if rapid changes in input conductance had a significant role in triggering spikes, one would expect that fewer spikes would exhibit a hyperpolarizing pre-spike increase in I and decrease in E (top-left quadrant) compared to the non-spiking period. Conversely, a decrease in I and an increase E (bottom-right quadrants) would likely result in a membrane potential depolarization so that more spikes should feature the corresponding pre-spike conductance changes compared to non-spiking periods. These relative shifts are precisely what can be observed in the plots of the two example neurons (Fig. 6G/H) and, in fact, across recordings (Fig. 6I). Finally, we compared the distributions of pre-spike changes in input inhibition and excitation of each postsynaptic neuron (Fig. 6J). Further indicating a pivotal role of inhibition in triggering spikes, 6 out of 7 neurons exhibited a clear decrease in the mean values (and medians) of pre-spike changes in inhibition compared to non-spiking periods. Interestingly, the 3 out of 7 neurons with an increase in excitation showed the smallest decrease in inhibition (or even an increase in inhibition in case of neuron #7). This latter observation suggests a matching of E and I inputs and cell-specific relative contributions of E and I conductance changes in triggering spikes.

      Theoretically, neuronal spiking could be driven by a prolonged suprathreshold depolarization (Petersen and Berg 2016; Renart et al. 2007) or, in more favorable subthreshold regimes, by fast synaptic input fluctuations (Ahmadian and Miller 2021; Amit and Brunel 1997; Brunel 2000; Van Vreeswijk and Sompolinsky 1996). In this section, we demonstrated that the majority of investigated neurons featured – during high-activity periods – a significant number of spikes that were associated with rapid pre-spike changes in input conductances. These findings suggest that even simple neuronal cultures can self-organize to form circuits exhibiting sophisticated spiking dynamics.”

      Our new analyses detailed in Fig. 6 show that there are also presumably depolarizing events (e.g., decrease in I and increase in E) in non-spiking regions. In future studies, it will be interesting to examine what distinguishes these events from spike-inducing events of similar magnitude – one possibility is a dependency on specific input-activation sequences.

      During the first days and weeks of developing neuronal cultures, spiking activity rapidly shifts from synapse-independent activity patterns to spiking dynamics that do depend on synaptic inputs and are progressively organized in network-wide high-activity events (DOI: 10.1016/j.brainres.2008.06.022). In our study, cultures at days-in-vitro 15-18 were used, and approximately 15% of the spikes occurred during high-activity events with relatively strong E and I input activity. In addition, spikes that occurred during low-activity events were at least partially regulated by synaptic input (see answers below related to Fig. 7).

      In the following, I am detailing what I would consider necessary to be done about these two Figures:

      Figure 6C is indeed great, though I don't see why the authors would characterize synchrony as low. When comparing with Figure 4B, I'd think that some of these values are quite high. And it wouldn't help me to imagine error bars in panel 6D.

      We have removed our characterization as ‘low’ from the text. One important difference between our synchrony measure (STTC) and the quantification of spike-transmission probability (STP) is the ‘lag’ of a few milliseconds for the STP quantification window to account for synaptic delay.

      Figure 6B is useful, but could be done better: The autocovariance of a shotnoise process is a convolution of the autocovariance of underlying point process and the autocovariance of the EPSC kernel. So one would want to separate those to obtain a better temporal resolution. But a shotnoise process has well defined peaks, and the time of these local maxima can be estimated quite precisely. Now if I would do a peak triggered average instead of the full convolution, I would do half of the deconvolution and obtain a temporally asymmetric curve of what is expected to happen around an EPSC. Importantly, one could directly see expected excitation after inhibition or expected inhibition after excitation, and this visualization could be much better and more intuitively compared to panel 6E.

      We appreciate the reviewer’s suggestion to present these results in a more sophisticated way. We would like to propose to stick with the original analysis to have it comparable with related analyses from the literature (e.g., DOI: 10.1038/nn.2105). Therefore, we hope the reviewer finds it acceptable that we leave the presentation of the data in its original form and potentially follow up in future work with the analysis strategy proposed by the reviewer.

      Panel D needs some variability estimate (i.e. standard deviation or interquartile range or even a probability density) for those traces.

      Figure 6E: Please use more visible colors. A sensitivity analysis to see traces for 2E/(2E+I) and E/(E+2I) would be great.

      Figure 6F: with an updated panel B, we should be able to have a slope for average inhibition after excitation for each of these cells. A second panel / third column showing those slopes would be of interest. It would serve as a reference for what could be expected from E-I interactions alone.

      With regard to the variability estimate in D, we now provide multiple panels characterizing the variability. For one, Fig. 6H contains a scatter plot of the pre-spike changes in input conductance across all individual postsynaptic spikes from the example cell shown in D. Moreover, in Fig. 7A, we show from the same example cell the standard deviations associated with the mean conductance traces separately for spikes that occurred during low- and high-activity states. For better visibility and because the separation according to activity states is more informative, we kept the original presentation of panel D (however, removing one example cell). In addition, we show the same mean traces from panel D with the respective standard deviations (across all spikes) in Supplementary Figure S3.

      Colors in Fig. 6E are adjusted, as requested.

      We have removed panel Fig. 6F as we now provide more detailed analyses at single-spike level (see Fig. 6G-J).

      Figure 6G: Could the authors provide an interquartile range here?

      With regard to the aligned input-output data from original panel Fig. 6G, now in panel Fig. 6F in the updated figure version, we show all individual traces that were averaged: the E/I traces from panel Fig. 6E and the three action potential waveforms from Supplementary Figure S5. Therefore, we chose to present the means only for better visibility.

      Figure 7A: it may be hard to squeeze in variability estimates here, but the information on whether and how much variance might be explained is essential. Maybe add another panel to provide a variability estimate? The variability estimate in panel 7B and 7D only reflect variability across connections, and it would be useful to add panels for the time courses of the variability of g (or E/(E+I) respectively).

      We now include the standard deviations across the input conductance traces in the updated Fig. 7A, as requested. We have also simplified Fig. 7 and performed the analysis using the 6 out of 7 neurons that, based on our new analysis (Fig. 6J) displayed a clear reduction in pre-spike inhibition, relative to the reference distribution. For a complete overview of the state-dependent changes in input conductance that are associated with individual postsynaptic spikes, we have included a new supplementary figure (Fig. S6). Fig. S6 also includes a characterization of the changes in input inhibition that occur right after postsynaptic spiking. In addition, Fig. S6D shows the standard deviations corresponding to the mean input conductance traces of all cells – separately for high- and low-activity periods.

      We added the following paragraph to the main text of the revised manuscript:

      “How can these deviations in the mean conductance profiles be explained? To answer this question, we further quantified – separately for low and high g states – the changes in input inhibition that occurred right before and after individual postsynaptic spikes (Fig. S6). This single-spike analysis suggested that, during high g states, most spikes experienced a post-spike increase and pre-spike decrease in inhibition (see also Fig. 6J). On the other hand, low g states were characterized by sparse synaptic input (e.g., see reconstruction in Fig. 6A). Therefore, many of the spikes that occurred during low g states were associated with little change in input conductance (note medians of approximately zero in Fig. S6A/C). Nevertheless, a considerable fraction of spikes (often > 25%) from low g states were also associated with a post-spike increase and pre-spike drop in inhibition. It, therefore, appears that even the sparse inhibitory inputs of low g states could influence spike timing. Moreover, the post-spike increases in input inhibition during low g states suggest that there were strong regulatory inhibitory circuits in place. However, limited activity levels during low g states presumably introduced an increased jitter of these spike-associated changes in input inhibition.

      In summary, the input inhibition of high-conductance states provides reliable and narrow windows-of-spiking opportunity. In addition, even during periods of sparse activity, there are rudimentary synaptic mechanisms in place to regulate spike timing.”

      As a suggestion for further analysis, though I am well aware that this is likely beyond the scope of this manuscript, I'd suggest the following analysis:

      I would split the data into the high and low activity states. Then I would compute the average of E/(E+I) values for spikes. Assuming that spikes tend to happen for local maxima of E/(E+I) I would find local maxima for periods without spike such that their average is equal to the value for actual spikes. Finally, I would test for a systematic difference in either excitation or inhibition.

      If there is no difference, you can make the claim that synaptic input does not guarantee a spike, and compare to a global average of E/(E+I).

      We are grateful for the fantastic suggestions for future analysis. We look forward to conducting these analyses in a more detailed follow-up characterization.

      In addition to the major alterations detailed above, we performed smaller corrections (e.g., spelling mistakes, inaccuracies) in some parts of the manuscript.

    1. eLife assessment

      Using microscopy experiments and theoretical modelling, the authors present convincing evidence of cellular coordination in the gliding filamentous cyanobacterium Fluctiforma draycotensis. The results are important for the understanding of cyanobacterial motility and the underlying molecular and mechanical pathways of cellular coordination.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use microscopy experiments to track the gliding motion of filaments of the cyanobacteria Fluctiforma draycotensis. They find that filament motion consists of back-and-forth trajectories along a "track", interspersed with reversals of movement direction, with no clear dependence between filament speed and length. It is also observed that longer filaments can buckle and form plectonemes. A computational model is used to rationalize these findings.

      Strengths:

      Much work in this field focuses on molecular mechanisms of motility; by tracking filament dynamics this work helps to connect molecular mechanisms to environmentally and industrially relevant ecological behavior such as aggregate formation.

      The observation that filaments move on tracks is interesting and potentially ecologically significant.

      The observation of rotating membrane-bound protein complexes and tubular arrangement of slime around the filament provides important clues to the mechanism of motion.

      The observation that long filaments buckle has the potential to shed light on the nature of mechanical forces in the filaments, e.g. through the study of the length dependence of buckling.

      Weaknesses:

      The manuscript makes the interesting statement that the distribution of speed vs filament length is uniform, which would constrain the possibilities for mechanical coupling between the filaments. However, Figure 1C does not show a uniform distribution but rather an apparent lack of correlation between speed and filament length, while Figure S3 shows a dependence that is clearly increasing with filament length. Also, although it is claimed that the computational model reproduces the key features of the experiments, no data is shown for the dependence of speed on filament length in the computational model. The statement that is made about the model "all or most cells contribute to propulsive force generation, as seen from a uniform distribution of mean speed across different filament lengths", seems to be contradictory, since if each cell contributes to the force one might expect that speed would increase with filament length.

      The computational model misses perhaps the most interesting aspect of the experimental results which is the coupling between rotation, slime generation, and motion. While the dependence of synchronization and reversal efficiency on internal model parameters are explored (Figure 2D), these model parameters cannot be connected with biological reality. The model predictions seem somewhat simplistic: that less coupling leads to more erratic reversal and that the number of reversals matches the expected number (which appears to be simply consistent with a filament moving backwards and forwards on a track at constant speed).

      Filament buckling is not analysed in quantitative detail, which seems to be a missed opportunity to connect with the computational model, eg by predicting the length dependence of buckling.

    3. Reviewer #2 (Public review):

      Summary:

      The authors combined time-lapse microscopy with biophysical modeling to study the mechanisms and timescales of gliding and reversals in filamentous cyanobacterium Fluctiforma draycotensis. They observed the highly coordinated behavior of protein complexes moving in a helical fashion on cells' surfaces and along individual filaments as well as their de-coordination, which induces buckling in long filaments.

      Strengths:

      The authors provided concrete experimental evidence of cellular coordination and de-coordination of motility between cells along individual filaments. The evidence is comprised of individual trajectories of filaments that glide and reverse on surfaces as well as the helical trajectories of membrane-bound protein complexes that move on individual filaments and are implicated in generating propulsive forces.

      Limitations:

      The biophysical model is one-dimensional and thus does not capture the buckling observed in long filaments. I expect that the buckling contains useful information since it reflects the competition between bending rigidity, the speed at which cell synchronization occurs, and the strength of the propulsion forces.

      Future directions:

      The study highlights the need to identify molecular and mechanical signaling pathways of cellular coordination. In analogy to the many works on the mechanisms and functions of multi-ciliary coordination, elucidating coordination in cyanobacteria may reveal a variety of dynamic strategies in different filamentous cyanobacteria.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present new observations related to the gliding motility of the multicellular filamentous cyanobacteria Fluctiforma draycotensis. The bacteria move forward by rotating their about their long axis, which causes points on the cell surface to move along helical paths. As filaments glide forward they form visible tracks. Filaments preferentially move within the tracks. The authors devise a simple model in which each cell in a filament exerts a force that either pushes forward or backwards. Mechanical interactions between cells cause neighboring cells to align the forces they exert. The model qualitatively reproduces the tendency of filaments to move in a concerted direction and reverse at the end of tracks.

      Strengths:

      The observations of the helical motion of the filament are compelling.

      The biophysical model used to describe cell-cell coordination of locomotion is clear and reasonable. The qualitative consistency between theory and observation suggests that this model captures some essential qualities of the true system.

      The authors suggest that molecular studies should be directly coupled to the analysis and modeling of motion. I agree.

      Weaknesses:

      There is very little quantitative comparison between theory and experiment. It seems plausible that mechanisms other than mechano-sensing could lead to equations similar to those in the proposed model. As there is no comparison of model parameters to measurements or similar experiments, it is not certain that the mechanisms proposed here are an accurate description of reality. Rather the model appears to be a promising hypothesis.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use microscopy experiments to track the gliding motion of filaments of the cyanobacteria Fluctiforma draycotensis. They find that filament motion consists of back-and-forth trajectories along a "track", interspersed with reversals of movement direction, with no clear dependence between filament speed and length. It is also observed that longer filaments can buckle and form plectonemes. A computational model is used to rationalize these findings.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      Much work in this field focuses on molecular mechanisms of motility; by tracking filament dynamics this work helps to connect molecular mechanisms to environmentally and industrially relevant ecological behavior such as aggregate formation.

      The observation that filaments move on tracks is interesting and potentially ecologically significant.

      The observation of rotating membrane-bound protein complexes and tubular arrangement of slime around the filament provides important clues to the mechanism of motion.

      The observation that long filaments buckle has the potential to shed light on the nature of mechanical forces in the filaments, e.g. through the study of the length dependence of buckling.

      We thank the reviewer for listing these positive aspects of the presented work.

      Weaknesses:

      The manuscript makes the interesting statement that the distribution of speed vs filament length is uniform, which would constrain the possibilities for mechanical coupling between the filaments. However, Figure 1C does not show a uniform distribution but rather an apparent lack of correlation between speed and filament length, while Figure S3 shows a dependence that is clearly increasing with filament length. Also, although it is claimed that the computational model reproduces the key features of the experiments, no data is shown for the dependence of speed on filament length in the computational model. The statement that is made about the model "all or most cells contribute to propulsive force generation, as seen from a uniform distribution of mean speed across different filament lengths", seems to be contradictory, since if each cell contributes to the force one might expect that speed would increase with filament length.

      We agree that the data shows in general a lack of correlation, rather than strictly being uniform. In the revised manuscript, we intend to collect more data from observations on glass to better understand the relation between filament length and speed. 

      In considering longer filaments, one also needs to consider the increased drag created by each additional cell - in other words, overall friction will either increase or be constant as filament length increases. Therefore, if only one cell (or few cells) are generating motility forces, then adding more cells in longer filaments would decrease speed.

      Since the current data does not show any decrease in speed with increasing filament length, we stand by the argument that the data supports that all (or most) cells in a filament are involved in force generation for motility. We would revise the manuscript to make this point - and our arguments about assuming multiple / most cells in a filament contributing to motility - clear.

      The computational model misses perhaps the most interesting aspect of the experimental results which is the coupling between rotation, slime generation, and motion. While the dependence of synchronization and reversal efficiency on internal model parameters are explored (Figure 2D), these model parameters cannot be connected with biological reality. The model predictions seem somewhat simplistic: that less coupling leads to more erratic reversal and that the number of reversals matches the expected number (which appears to be simply consistent with a filament moving backwards and forwards on a track at constant speed).

      We agree that the coupling between rotation, slime generation and motion is interesting and important when studying the specific mechanism leading to filament motion. However, we believe it even more fundamental to consider the intercellular coordination that is needed to realise this motion. Individual filaments are a collection of independent cells. This raises the question of how they can coordinate their thrust generation in such a way that the whole filament can both move and reverse direction of motion as a single unit. With the presented model, we want to start addressing precisely this point.

      The model allows us to qualitatively understand the relation between coupling strength and reversals (erratic vs. coordinated motion of the filament). It also provides a hint about the possibility of de-coordination, which we then look for and identify in longer filaments.

      While the model results seem obvious in hindsight, the analysis of the model allows phrasing the question of cell-to-cell coordination, which has not been brought up previously when considering the inherently multi-cell process of filament motility.

      Filament buckling is not analysed in quantitative detail, which seems to be a missed opportunity to connect with the computational model, eg by predicting the length dependence of buckling.

      Please note that Figure S10 provides an analysis of filament length and number of buckling instances observed. This suggests that buckling happens only in filaments above a certain length.

      We do agree that further analyses of buckling - both experimentally and through modelling would be interesting.  This study, however,  focussed on cell-to-cell coupling / coordination during filament motility. We have identified the possibility of de-coordination through the use of a simple 1D model of motion, and found evidence of such de-coordination in experiments. Notice that the buckling we report does not depend on the filament hitting an external object. It is a direct result of a filament activity which, in this context, serves as evidence of cellular de-coordination.

      Now that we have observed buckling and plectoneme formation, these processes need to be analysed with additional experiments and modelling. The appropriate model for this process needs to be 3D, and should ideally include torques arising from filament rotation. Experimentally, we need to identify means of influencing filament length and motion and see if we can measure buckling frequency and position across different filament lengths. These works are ongoing and will have to be summarised in a separate, future publication.

      Reviewer #2 (Public review):

      Summary:

      The authors combined time-lapse microscopy with biophysical modeling to study the mechanisms and timescales of gliding and reversals in filamentous cyanobacterium Fluctiforma draycotensis. They observed the highly coordinated behavior of protein complexes moving in a helical fashion on cells' surfaces and along individual filaments as well as their de-coordination, which induces buckling in long filaments.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The authors provided concrete experimental evidence of cellular coordination and de-coordination of motility between cells along individual filaments. The evidence is comprised of individual trajectories of filaments that glide and reverse on surfaces as well as the helical trajectories of membrane-bound protein complexes that move on individual filaments and are implicated in generating propulsive forces.

      We thank the reviewer for listing these positive aspects of the presented work.

      Limitations:

      The biophysical model is one-dimensional and thus does not capture the buckling observed in long filaments. I expect that the buckling contains useful information since it reflects the competition between bending rigidity, the speed at which cell synchronization occurs, and the strength of the propulsion forces.

      Cell-to-cell coordination is a more fundamental phenomenon than the buckling and twisting of longer filaments, in that the latter is a consequence of limits of the former. In this sense, we are focussing here on something that we think is the necessary first step to understand filament gliding. The 3D motion of filaments (bending, plectoneme formation) is fascinating and can have important consequences for collective behaviour and macroscopic structure formation. As a consequence of cellular coupling, however, it is beyond the scope of the present paper.

      Please also see our response above. We believe that the detailed analysis of buckling and plectoneme formation requires (and merits) dedicated experiments and modelling which go beyond the focus of the current study (on cellular coordination) and will constitute a separate analysis that stands on its own. We are currently working in that direction.

      Future directions:

      The study highlights the need to identify molecular and mechanical signaling pathways of cellular coordination. In analogy to the many works on the mechanisms and functions of multi-ciliary coordination, elucidating coordination in cyanobacteria may reveal a variety of dynamic strategies in different filamentous cyanobacteria.

      We thank the reviewer for highlighting this point again and seeing the value in combining molecular and dynamical approaches.

      Reviewer #3 (Public review):

      Summary:

      The authors present new observations related to the gliding motility of the multicellular filamentous cyanobacteria Fluctiforma draycotensis. The bacteria move forward by rotating their about their long axis, which causes points on the cell surface to move along helical paths. As filaments glide forward they form visible tracks. Filaments preferentially move within the tracks. The authors devise a simple model in which each cell in a filament exerts a force that either pushes forward or backwards. Mechanical interactions between cells cause neighboring cells to align the forces they exert. The model qualitatively reproduces the tendency of filaments to move in a concerted direction and reverse at the end of tracks.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The observations of the helical motion of the filament are compelling.

      The biophysical model used to describe cell-cell coordination of locomotion is clear and reasonable. The qualitative consistency between theory and observation suggests that this model captures some essential qualities of the true system.

      The authors suggest that molecular studies should be directly coupled to the analysis and modeling of motion. I agree.

      We thank the reviewer for listing these positive aspects of the presented work and highlighting the need for combining molecular and biophysical approaches.

      Weaknesses:

      There is very little quantitative comparison between theory and experiment. It seems plausible that mechanisms other than mechano-sensing could lead to equations similar to those in the proposed model. As there is no comparison of model parameters to measurements or similar experiments, it is not certain that the mechanisms proposed here are an accurate description of reality. Rather the model appears to be a promising hypothesis.

      We agree with the referee that the model we put forward is one of several possible. We note, however, that the assumption of mechanosensing by each cell - as done in this model - results in capturing both the alignment of cells within a filament (with some flexibility) and reversal dynamics. We have explored an even more minimal 1D model, where the cell’s direction of force generation is treated as an Ising-like spin and coupled between nearest neighbours (without assuming any specific physico-chemical basis). We found that this model was not fully able to capture both phenomena. In that model, we found that alignment required high levels of coupling (which is hard to justify except for mechanical coupling) and reversals were not readily explainable (and required additional assumptions). These points led us to the current, mechanically motivated model.

      The parameterisation of the current model would require measuring cellular forces. To this end, a recent study has attempted to measure some of the physical parameters in a different filamentous cyanobacteria [1] and in our revision we will re-evaluate model parameters and dynamics in light of that study. We will also attempt to directly verify the presence of mechano-sensing by obstructing the movement of filaments.

    1. eLife assessment

      The authors present a solid statistical framework for using sibling phenotype data to assess whether there is evidence for de-novo or rare variants causing extreme trait values. Their valuable method is promising and will be of interest to researchers studying complex trait genetics.

    2. Reviewer #1 (Public review):

      This is a clever and well-done paper. The authors sought to craft a method, applicable to biobank-scale data but without necessarily using genotyping or sequencing, to detect the presence of de novo mutations and rare variants that stand out from the polygenic background of a given trait. Their method depends essentially on sibling pairs where one sibling is in an extreme tail of the phenotypic distribution and whether the other sibling's regression to the mean shows a systematic deviation from what is expected under a simple polygenic architecture.

      Their method is successful in that it builds on a compelling intuition, rests on a rigorous derivation, and seems to show reasonable statistical power in the UK Biobank. (More biobanks of this size will probably become available in the near future.) It is somewhat unsuccessful in that rejection of the null hypothesis does not necessarily point to the favored hypothesis of de novo or rare variants. The authors discuss the alternative possibility of rare environmental events of large effect.

      Comments on current version:

      The authors have addressed the concerns of the reviewers. I have no further comments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors present valuable findings on how to determine the genetic architecture of extreme phenotype values by using data on sibling pairs. While the authors' derivations of the method are correct, the scenarios considered are incomplete, making it difficult to have confidence in the interpretation of the results as demonstrating the influence of de-novo or Mendelian (rare, penetrant-variant) architectures. The method nevertheless shows promise and will be of interest to researchers studying complex trait genetics. 

      A.1: We have now expanded our consideration of the scenarios and we have ensured that we do not over-interpret our results as being due to de novo or Mendelian architectures. Instead, we make clear that our statistical tests are powered to identify these architectures but that there are other potential causes of significant results (e.g. measurement error or uncontrolled environmental factors from heavy-tailed distributions), making follow-up validation studies necessary before underlying architectures can be confirmed. We consider this to be typical of observational research, in which significant results may indicate causal effects unless uncontrolled confounding factors explain the observed associations, requiring experimental/trial follow-up for validation. We believe that our tests are useful for providing initial inference, and that in some settings – e.g. prioritising samples for sequencing to identify rare variants – could be useful as an initial screening step to increase the efficacy of a planned analysis or study.

      Additionally, we have now developed “SibArc”, an openly available software tool that takes input sibling trait data and estimates conditional sibling heritability across the trait distribution. Then - based on our theoretical framework developed and described in the paper - for each tail of the trait distribution, estimates effect sizes and generates P-values corresponding to our de novo and Mendelian tests, and performs a Kolmogorov-Smirnov test to identify general departures from our null model. Furthermore, SibArc also provides additional functionality for users under preliminary beta form, for example, running an iterative optimisation routine to infer approximate relative degrees of polygenic, de novo, and Mendelian architectures prevailing in each trait tail. We have made this software tool, Quick Start tutorial, and sample data available online at Github and are hosting these on a dedicated website: www.sibarc.net.

      Reviewer #1 (Public Review):

      This is a clever and well-done paper that should be published. The authors sought to craft a method, applicable to biobank-scale data but without necessarily using genotyping or sequencing, to detect the presence of de novo mutations and rare variants that stand out from the polygenic background of a given trait. Their method depends essentially on sibling pairs where one sibling is in an extreme tail of the phenotypic distribution and whether the other sibling's regression to the mean shows a systematic deviation from what is expected under a simple polygenic architecture. 

      Their method is successful in that it builds on a compelling intuition, rests on a rigorous derivation, and seems to show reasonable statistical power in the UK Biobank. (More biobanks of this size will probably become available in the near future.)  It is somewhat unsuccessful in that rejection of the null hypothesis does not necessarily point to the favored hypothesis of de novo or rare variants. The authors discuss the alternative possibility of rare environmental events of large effect. Maybe attention should be drawn to this in the abstract or the introduction of the paper. Nevertheless, since either of these possibilities is interesting, the method remains valuable. 

      A.2: We agree with the reviewer that we should have made it clearer that - while our statistical tests are powered to identify de novo and Mendelian architectures – significant findings from our tests could also be explained by rare environmental events of large effect (specifically by uncontrolled environmental factors with heavy-tailed distributions). We have now made this clear throughout the manuscript (see A.1).

      Moreover, we agree with the reviewer that whether the cause of deviations from expectations are due to de novo or rare variants, or environmental factors, either possibility is interesting. For example, in either scenario, our results can highlight inaccuracy in PRS prediction of extreme trait values for certain traits, and also provides a relative measure across different traits of large effects impacting on the trait tails, irrespective of whether genetic or environmental. We now place more emphasis on this point throughout the manuscript.

      Reviewer #2 (Public Review):

      Souaiaia et al. attempt to use sibling phenotype data to infer aspects of genetic architecture affecting the extremes of the trait distribution. They do this by considering deviations from the expected joint distribution of siblings' phenotypes under the standard additive genetic model, which forms their null model. They ascribe excess similarity compared to the null as due to rare variants shared between siblings (which they term 'Mendelian') and excess dissimilarity as due to de-novo variants. While this is a nice idea, there can be many explanations for rejection of their null model, which clouds interpretation of Souaiaia et al.'s empirical results.

      A.3: We agree with the reviewer that we should have made clearer that there are other explanations for significant results from our tests and we have now fully addressed this point – (see A.1, A.2, A.4, A.5 for more detail).  In addition, we now elaborate on exactly what our null hypothesis is: which is not only that the expected joint distribution of siblings’ phenotypes is governed by the standard additive genetic model, but that environmental effects are either controlled for or else their combined effect is approximately Gaussian. Furthermore, by selecting only those traits whose raw trait distribution most closely corresponds to a Gaussian distribution from the UK Biobank, we increase the probability that significant results from our tests are due to rare variants (shared or unshared among siblings).

      The authors present their method as detecting aspects of genetic architecture affecting the extremes of the trait distribution. However, I think it would be better to characterize the method as detecting whether siblings are more or less likely to be aggregated in the extremes of the phenotype distribution than would be predicted under a common variant, additive genetic model.

      A.4: As discussed above we should have stated more clearly that significant results could be due to non-genetic factors, we have now addressed this.

      However, we do not think that it would be appropriate to characterise our tests as merely corresponding to over and under aggregation of siblings in the tails. Firstly, environmental factors should be controlled for as part of our testing, increasing the probability that significant results are due to genetic, and not environmental factors. Secondly, tests for identifying broad over and under aggregation of siblings in the tails should be designed differently and, accordingly, the tests that we have developed here would not be optimal to detect over/under aggregation of siblings in trait tails. Our test for inference of de novo variants, for example, exploits the fact that de novo alleles of large effect result in one sibling being extreme and all others being drawn from the background distribution, so that the mean of other siblings is relatively low – not merely that other siblings are less likely to be found in the tail. For more discussion on this issue in relation to one of reviewer 1’s points, see A.9.

      Exactly how the rareness and penetrance of a genetic variant influence the conditional sibling phenotype distribution at the extremes is not made clear. The contrast between de-novo and 'Mendelian' architectures is somewhat odd since these are highly related phenomena: a 'Mendelian' architecture could be due to a de-novo variant of the previous generation. The fact that these two phenomena are surmised to give opposing signatures in the authors' statistical tests seems suboptimal to me: would it not be better to specify a parameter that characterizes the degree or sharing between siblings of rare factors of large effect? This could be related to the mixture components in the bimodal distribution displayed in Fig 1. In fact, won't the extremes of all phenotypes be influenced by all three types of variants (common, rare, de-novo) to greater or lesser degree? By framing the problem as a hypothesis testing problem, I think the authors are obscuring the fact that the extremes of real phenotypes likely reflect a mixture of causes: common, de-novo, and rare variants (and shared and non-shared environmental factors). 

      A.5: We absolutely recognise that there will typically be a complex and continuous mix of genetic architectures underlying complex traits in their tails, dictated by the 2-dimensional relationship between allele frequency and effect size. We did consider developing a fully Bayesian statistical framework to model this, but soon realised that doing this properly would require a substantial amount of model development, accounting for multiple factors in ways that would require a great deal of further investigation; for example, performing a range of complex simulations to investigate the effects of different selective pressures over time, different patterns of assortative mating, and effect size generating distributions. We are in the process of applying for funding for a multi-year project that will perform exactly these investigations as a step towards developing more sophisticated models of inference. In the meantime, we do believe that the simpler hypothesis-testing framework that we have developed here does have important value. Assuming that environmental factors are accounted for, or that any that are not accounted for have combined Gaussian effects, then our tests will indeed infer enrichments of de novo and ‘Mendelian’ rare alleles of large effect in the tails of complex traits. Results from these tests can also be compared within and across traits to compare the relative degree of such enrichments among traits. For some traits we observe significant results from both tests, and for other traits we observe highly significant results from one of our tests but not the other. Thus, while our tests do not provide a complete picture about the genetic architecture in the tails of complex traits, they do offer some intriguing initial insights into tail architecture, important given the enrichment of disease in trait tails.

      To better enable interpretation of the results of this method, a more comprehensive set of simulations is needed. Factors that may influence the conditional distribution of siblings' phenotypes beyond those considered include: non-normal distribution, assortative mating, shared environment, interactions between genetic and shared environmental factors, and genetic interactions. 

      A.6: As described above (see A.5) we do agree that a more comprehensive set of simulations is exactly what is needed to further extend this work. However, we believe that the tests that we have developed so far, which make some simplifying assumptions that we think would often hold in practice, is a useful start to what is an entirely novel approach to inferring genetic architecture from family trait-only (non-genetic) data. Our work could already be useful for method developers who may wish to extend our approach in ways that we may not think of. It could also be useful for applied scientists focusing on specific traits who will be able to gain initial, inference-level, insights by applying our tests to their data, while the results of applying our tests may even guide study design of rare variant mapping studies.

      In summary, I think this is a promising method that is revealing something interesting about extreme values of phenotypes. Determining exactly what is being revealed is going to take a lot more work, however. 

      A.7: We thank the reviewer for highlighting the promise in our approach and agree that it is revealing something interesting about complex traits. We also agree that it is going to take a lot more work to reveal exactly what that is for different traits, which we plan to work on ourselves and hope that this paper will help other interested scientists to follow-up on and extend as well.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      R.1.1: Why these particular traits (body fat, mean corpuscular haemoglobin, neuroticism, heel bone mineral density, monocyte count, sitting height)? 

      A.8: Traits were initially selected to cover a variety of traits (anthropometric, metabolic, personality..) and to illustrate different examples of tail architecture. However, in response to a point from reviewer 2 (see A.17), we have now overhauled our quality control of traits to ensure that only traits closely matching Gaussian distributions are included. In total, 18 traits were selected, with detailed results presented in Appendix 4 and results corresponding to 6 of the traits presented in the main text (Figure 6) to show examples of different types of tail architecture.

      R.1.2: Why are there separate tests for de novo and Mendelian architectures? It seems that one could use either of the derived tests for both purposes, simply by switching to a two-sided test for each tail. My guess is that the score test of whether alpha is zero would be the more statistically powerful test. 

      A.9: The score test of whether alpha is zero has limited power to detect Mendelian architectures. This is because under Mendelian effects, half the siblings in a family have trait values reflecting the background distribution, such that the mean of sibling trait values is not so different from the polygenic expectation (i.e. alpha close to 0). The Mendelian score test that we developed is substantially more powerful because it evaluates co-occurrence of siblings in the tails, which is far higher under Mendelian architecture in the tail than compared to polygenic architecture.

      However, in order test for general departures from our null model, including those of non-Gaussian environmental factors, we now include results from performing a Kolmogorov-Smirnoff test of difference from the expected distribution, and also provide this test as an option in our ‘SibArc’ software tool.

      R.1.3: This method assumes that assortative mating is absent. I worry that sitting height might not be a good trait to analyze, since there is some assortative mating (~0.3) for height (e.g., Yengo et al., 2018). Perhaps this trait should not be included among those that are analyzed in this paper. Then again, it is possible that there is less assortative mating for sitting height than total height (i.e., leg length) (Jensen & Sinha, 1993). 

      A.10:  It is true that our method assumes random mating. We note that while  assortative mating increases sibling similarity relative to expectation, if it is stable across the trait distribution it will also bias heritability estimation upward which is likely it’s potential impact in our framework.  However, if assortative mating is more prevalent in the tails of the distribution, it can result in excess kurtosis – an impact that can increase false positive Mendelian tests and false negative de novo tests.  Given that the trait distribution for Sitting Height has only moderate excess Kurtosis (~0.4, see Fig 9, Appendix 4) and we inferred de novo architecture only for this trait, we feel that including it in the paper is appropriate. 

      R.1.4: I wonder if it's possible to discuss the impact of non-additive genetic variance on the method. How does this affect the estimation of heritability, which calibrates the expectation for regression to the mean? Can non-additive genetic deviations explain a rejection of the null hypothesis of simple polygenicity? 

      A.11: Yes, the heritability estimation, which calibrates expectation for regression to the mean, assumes additivity of effects, as do the most popular estimators of heritability from GWAS data in the field: GCTA-GREML, LD Score regression and LDAK. Accordingly, non-additive genetic effects could result in rejection of the null hypothesis. We have highlighted this point in the Discussion. However, we also point out that current evidence suggests that the contribution of non-additive genetic effects to complex trait variation is relatively small (Hivert 2021) and that non-additive genetic effects that have a similar impact across the trait distribution should not be a problem for our approach (only those that have an increasing effect towards the tails would be).

      R.1.5: p.5: Maybe a more realistic way to simulate a genetic architecture is to draw the MAF from the distribution [MAF(1 - MAF)]^{-1} and then an effect of the minor allele from some mound-shaped distribution (e.g., mixture of normals). The absolute or squared effect of the minor allele should increases as the MAF decreases, and there have been some papers trying to estimate this relationship (e.g., Zeng et al., 2021). Maybe make the number of causal SNPs 10,000. I don't rate this as an urgent suggestion because my sense is that the method should be robust, making adequate even a fairly minimal simulation confirming its accuracy. 

      A.11: In separate work, we have performed a comprehensive simulation study using the forward-in-time population genetic simulator SLIM-3 (Haller and Messer, 2019), which generates genetic effects according to Gaussian and Gamma distributions and models different selective pressures on complex traits. We plan to publish this work shortly and also extend the simulations to family data, from which we will be able to test the performance of our methods here under a range of different scenarios of genetic variation generation, including a variety of relationships between allele frequency and effect sizes. We agree with the reviewer that at this point, however, our minimal simulation should be sufficient to confirm our tests’ general robustness and so we will perform further testing once we have extended our more sophisticated simulation study.

      R.1.6: p.6: Step D seems to leave out a normalization of G to have unit variance. Also, the last part should say "the square of the correlation between the genetic liability and the trait is equal to the heritability." 

      A.12: Corrected – we thank the reviewer for spotting this.

      R.1.7: Figure 5: The power being adequate if roughly 1 of a 1000 index siblings with an extreme trait value owes their values to de novo mutations makes me think that there should be a discussion of the prior probability. The average person carries about 80 de novo mutations. How many of these are likely to affect, e.g., height? Zeng et al. (2021) gave estimates of mutational targets. Given that a mutation affects height, will its likely effect size be large enough to be detected with the method? Kemper et al. (2012) discussed this point in a perhaps useful way. 

      A.13: We find the work investigating mutational target sizes and generating effect sizes of different mutations (de novo or rare) to be extremely interesting and critical for understanding the causes of observed genetic variation. However, we think that this work is insufficiently progressed at this point to build on directly here for making more nuanced interpretation of our results. We are, however, exploring the impact of mutational target sizes, effect size distributions and selection effects, on the genetic architecture of complex traits via population genetic simulations (see A.11), and so we hope to be able to provide more in-depth interpretation of our results in the future.

      R.1.8: Figure 6: The number in the tables for Mendelian architecture are presumably observed and expected counts. But what about the numbers for de novo architecture? Those don't look like counts. Maybe they are conditional expectations of standardized trait values. Whatever the case may be, the caption should provide an explanation. 

      A.14: The observed and expected values for the de novo statistical test represent the expected and observed mean standardized trait values for siblings of individuals in the bottom and top 1% of the distribution. We have now made this clear in our updated figure.

      R.1.9: p. 16: Element (2,1) in the precision matrix after Equation 15 is missing a negative sign. 

      A.15: Corrected – we thank the reviewer for spotting this.

      R.1.10: p. 20: Shouldn't Equation 20 place an exponent of n on the factor outside of the exponential? 

      A.16: Corrected – we thank the reviewer for spotting this.

      Reviewer #2 (Recommendations For The Authors):

      R.2.1: The first concern that I have is that their statistical tests rely heavily on an assumption of bivariate normal distribution for sibling pair's phenotypes. Real phenotypes do not have such a distribution in general. The authors rely upon an inverse-normal transform when applying their method to real data. While the inverse-normal transform will ensure that the siblings' phenotypes have a marginal normal distribution, such a transform does not ensure that the joint distribution is bivariate normal. The authors should examine their procedure for simulated phenotypes with a non-normal distribution to see if their statistical tests remain properly calibrated. Related to this, I am concerned about applying an inverse normal transform to the neuroticism phenotype that contains only 13 unique values in UKB. How does the transform deal with tied values? Can we sensibly talk about extreme trait values for such a set of observations? 

      A.17: The reviewer is correct that a bivariate normal distribution for sibling pairs’ trait values does not necessarily hold, and only does so if the assumptions of our null model are met (polygenic effects, Gaussian environmental effects, random mating..). We have now more clearly described the assumptions of our null model, and to increase the matching of our selected traits to those assumptions we have expanded our analyses and now present results on traits that are close to Gaussian. As part of this more strict quality control, only traits with more than 50 unique values are included, meaning that neuroticism is excluded in our final analysis. We also now note that performing an inverse normal transformation on the traits only increases the robustness of the tests to some of our modelling assumptions. In future work we plan to investigate how best to model the conditional sibling distribution under a variety of non-Gaussian environmental effects and different non-random patterns of mating.

      R.2.2: The joint sibling phenotype distribution (Equation 4) can be derived by applying the formula for the conditional distribution of a multivariate Gaussian to the standard additive genetic model. The authors' derivation is unnecessarily complex. Furthermore, many of the formulae have been used in Shai Carmi's work on embryo screening, but this work is not cited. 

      A.18: We now state in the text that the conditional sibling distribution can also be derived from the joint trait distribution of related individuals, which we use in our extension to the 3-sibling scenario, and cite Shai Carmi’s work where this is used. The joint distribution is a more straightforward way to derive the conditional sibling distribution, but our derivation based on considering mid-parents is generalisable to cases where assumptions of random mating, Gaussian population trait distribution and no selection do not hold. We also think that our mid-parent based derivation will be more intuitive to many readers, leading to greater understanding and potential for extension. Therefore, overall we believe that its presentation is worthwhile and we have now elaborated on this in the Methods.

      R.2.3: Equation 8: this probability should be conditional on s1 

      A.19: Corrected – we thank the reviewer for spotting this.

      R.2.4: The empirical application to UKB data is lacking methodological details. Also, the number of siblings used is low compared to the number of available sibling pairs. Around 19k sibling pairs are available in the UKB white British subsample, but only 10k were used for height. Why? Also, why are extreme values excluded? Isn't this removing the signal the authors are looking to explain?

      A.20: We have now provided more methodological details throughout the Methods section, in particular in relation to the samples used and quality control performed. The removal of individuals with extreme values, in particular, is because unusually low/high trait values are more likely to be due to measurement error (e.g. due to imperfect measuring device, or storage/assaying) than for typical values, and so while this may also result in some loss in power (albeit small due to few individuals having values +/- 8 s.d. trait means) we consider it worth it for the potential reduction in type I error. In performing our newly expanded analysis (described above), and accounting for the reviewer’s point here about sample size, we did find a bug in our pipeline that meant that we did not include as many sibling pairs as available. We thank the reviewer for spotting this, since this contributed to our new analysis being substantially more powerful than the original (including up to ~17k sibling pairs depending on completeness of trait data).

      Benjamin C Haller, Phillip W Messer. SLiM 3: Forward Genetic Simulations Beyond the Wright–Fisher Model. Molecular Biology and Evolution. 2019. 36(3): 632-637.

      SD Whiteman, SM McHale, A Soli. Theoretical Perspectives on Sibling Relationships. J Fam Theory Rev. 2011 Jun 1;3(2):124-139.

      Nicholas H Barton, Alison M Etheridge, and Amandine Véber. The infinitesimal model: Definition, derivation, and implications. Theoretical population biology, 118:50–73, 2017.

      Valentin Hivert et al. “Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals.” American journal of human genetics vol. 108,5 (2021)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors demonstrate that it is possible to carry out eQTL experiments for the model eukaryote S. cerevisiae, in "one pot" preparations, by using single-cell sequencing technologies to simultaneously genotype and measure expression. This is a very appealing approach for investigators studying genetic variation in single-celled and other microbial systems, and will likely inspire similar approaches in non-microbial systems where comparable cell mixtures of genetically heterogeneous individuals could be achieved.

      Strengths:

      While eQTL experiments have been done for nearly two decades (the corresponding author's lab are pioneers in this field), this single-cell approach creates the possibility for new insights about cell biology that would be extremely challenging to infer using bulk sequencing approaches. The major motivating application shown here is to discover cell occupancy QTL, i.e. loci where genetic variation contributes to differences in the relative occupancy of different cell cycle stages. The authors dissect and validate one such cell cycle occupancy QTL, involving the gene GPA1, a G-protein subunit that plays a role in regulating the mating response MAPK pathway. They show that variation at GPA1 is associated with proportional differences in the fraction of cells in the G1 stage of the cell cycle. Furthermore, they show that this bias is associated with differences in mating efficiency.

      Weaknesses:

      While the experimental validation of the role of GPA1 variation is well done, the novel cell cycle occupancy QTL aspect of the study is somewhat underexploited. The cell occupancy QTLs that are mentioned all involve loci that the authors have identified in prior studies that involved the same yeast crosses used here. It would be interesting to know what new insights, besides the "usual suspects", the analysis reveals. For example, in Cross B there is another large effect cell occupancy QTL on Chr XI that affects the G1/S stage. What candidate genes and alleles are at this locus? And since cell cycle stages are not biologically independent (a delay in G1, could have a knock-on effect on the frequency of cells with that genotype in G1/S), it would seem important to consider the set of QTLs in concert.

      We thank the reviewer for this suggested clarification. We have modified the text to make it clear that cell cycle occupancy is a compositional phenotype. Like the reviewer, we also noticed the distal trans eQTL hotspot on Chr XI in Cross B, but we were not able to identify compelling candidate gene(s) or variant(s) despite extensive effort.

      Reviewer #2 (Public Review):

      Boocock and colleagues present an approach whereby eQTL analysis can be carried out by scRNA-Seq alone, in a one-pot-shot experiment, due to genotypes being able to be inferred from SNPs identified in RNA-Seq reads. This approach obviates the need to isolate individual spores, genotype them separately by low-coverage sequencing, and then perform RNA-Seq on each spore separately. This is a substantial advance and opens up the possibility to straightforwardly identify eQTLs over many conditions in a cost-efficient manner. Overall, I found the paper to be well-written and well-motivated, and have no issues with either the methodological/analytical approach (though eQTL analysis is not my expertise), or with the manuscript's conclusions.

      I do have several questions/comments.

      393 segregant experiment:

      For the experiment with the 393 previously genotyped segregants, did the authors examine whether averaging the expression by genotype for single cells gave expression profiles similar to the bulk RNA-Seq data generated from those genotypes? Also, is it possible (and maybe not, due to the asynchronous nature of the cell culture) to use the expression data to aid in genotyping for those cells whose genotypes are ambiguous? I presume it might be if one has a sufficient number of cells for each genotype, though, for the subsequent one-pot experiments, this is a moot point.

      As mentioned in our preliminary response, while it is possible to expand the analysis along these lines, this is not relevant for the subsequent one-pot experiments. We have made all the data available so that anyone interested can try these analyses.

      Figure 1B:

      Is UMAP necessary to observe an ellipse/circle - I wouldn't be surprised if a simple PCA would have sufficed, and given the current discussion about whether UMAP is ever appropriate for interpreting scRNA-Seq (or ancestry) data, it seems the PCA would be a preferable approach. I would expect that the periodic elements are contained in 2 of the first 3 principal components. Also, it would be nice if there were a supplementary figure similar to Figure 4 of Macosko et al (PMID 26000488) to indeed show the cell cycle dependent expression.

      We have added two new figures (S2 and S3) that represent alternative visualizations of the cell-cycle that are not dependent on UMAP. Figure S2 shows plots of different pairs of principal components, with each cell colored by its assigned cell-cycle stage. We do not observe a periodic pattern in the first 3 principal components as the reviewer expected, but when we explore the first 6 principal components, we see combinations of components that clearly separate the cell cycle clusters. We emphasize that the clusters were generated using the Louvain algorithm and assigned to cell-cycle stages using marker genes, and that UMAP was used only for visualization.

      We could not create a figure similar to Macosko et al. because of differences between the cell cycle categories we used and those of Spellman et al (PMID 9843569). We instead created Figure S3 to address the reviewer's comment. This figure uses a heatmap in a style similar to that of Macosko et al. to display cell-cycle-dependent expression of the 22 genes we used as cell cycle markers across each of the five cell cycle stages (M/G1, G1, G1/S, S, G2/M).

      We have renumbered the supplementary figures after incorporating these two additional supplementary figures into the manuscript.

      Aging, growth rate, and bet-hedging:

      The mention of bet-hedging reminded me of Levy et al (PMID 22589700), where they saw that Tsl1 expression changed as cells aged and that this impacted a cell's ability to survive heat stress. This bet-hedging strategy meant that the older, slower-growing cells were more likely to survive, so I wondered a couple of things. It is possible from single-cell data to identify either an aging, or a growth rate signature? A number of papers from David Botstein's group culminated in a paper that showed that they could use a gene expression signature to predict instantaneous growth rate (PMID 19119411) and I wondered if a) this is possible from single-cell data, and b) whether in the slower growing cells, they see markers of aging, whether these two signatures might impact the ability to detect eQTLs, and if they are detected, whether they could in some way be accounted for to improve detection.

      As mentioned in our preliminary response, we are not sure how to look for gene expression signatures of aging in yeast scRNA-seq data. We believe that the proposed analyses are beyond the scope of the current paper. As noted above, we have made all the data available so that anyone interested can explore these hypotheses.

      AIL vs. F2 segregants:

      I'm curious if the authors have given thought to the trade-offs of developing advanced intercross lines for scRNA-Seq eQTL analysis. My impression is that AIL provides better mapping resolution, but at the expense of having to generate the lines. It might be useful to see some discussion on that.

      We thank the reviewer for the comments. We believe that a discussion of trade-offs between different approaches for constructing mapping populations, such as AIL and F2 segregants, is beyond the scope of this paper.

      10x vs SPLit-Seq

      10x is a well established, but fairly expensive approach for scRNA-Seq - I wondered how the cost of the 10x approach compares to the previously used approach of genotyping segregants and performing bulk RNA-Seq, and how those costs would change if one used SPLiT-Seq (see PMID 38282330).

      We thank the reviewer for the comments. We believe that a discussion of cost trade-offs between 10x and other approaches is beyond the scope of this paper, especially given the rapidly evolving costs of different technologies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Throughout the results section the authors point to File S1 for additional information. This file is a tarball with about 20 Excel documents in it, each with several sheets embedded. The authors should provide a detailed README describing how to understand the organizations of the files in File S1 and the many embedded sheets in each file. Statements made in the manuscript about File S1 should explicitly direct the reader to a specific spreadsheet and table to refer to.

      We have added an additional README file to the tarball that explains the organization of File S1 and describes the data contained in each sheet. Throughout the text, we now reference specific spreadsheets to assist the reader. In addition, these spreadsheets have been added to a github repository https://github.com/theboocock/finemapping_spreadsheets_single_cell

      Neither of the two GitHub repositories referenced under "Code availability" has adequate documentation that would allow a reader to try and reproduce the analyses presented here. The one entitled https://github.com/joshsbloom/single_cell_eQTL has no functional README, while https://github.com/theboocock/yeast_single_cell_post_analysis is somewhat better but still hard to navigate. Basic information on expected inputs, file formats, file organization, output types, and formats, etc. is required to get any of these pipelines to run and should be provided at a minimum.

      We thank the reviewer for the comment. In response, we have refactored both GitHub repositories and added extensive documentation to improve usability. We updated the versions of software and packages, this has been reflected in the methods section.

      S. cerevisiae strains are preferentially diploid in nature and many genes involved in the mating pathway are differentially regulated in diploids vs haploids. Have the authors explored the fitness effects of the GPA1 82R allele in diploids? What is the dominance relationship between 82W and 82R?

      We thank the reviewer for the comment. In diploid yeast, the mating pathway is repressed, and thus we would not expect there to be any fitness consequences due to the presence of different alleles of GPA1.

      The diploid expression profiling (page 5 and Table S9) doesn't implicate GPA1; can you the authors comment on this in light of their finding in haploids?

      The mating pathway, including GPA1, is repressed in diploids, and hence the expression of GPA1 cannot be studied in these strains (PMID: 3113739). In addition, allele-specific expression differences only identify cis-regulatory effects. We know that the GPA1 variant results in a protein-coding change, which may or may not influence the levels of mRNA in cis, so that even if GPA1 were expressed in diploids, there would be no expectation of an allele-specific difference in expression.

      With respect to the candidate CYR1 QTL -- note that strains with compromised Cyr1 function also generally show increased sporulation rates and/or sporulation in rich media conditions (cAMP-PKA signaling represses sporulation). Is this the case in diploids with the CBS2888 allele at CYR1? If the CBS2888 allele is a CYR1 defect one might expect reduced cAMP levels. It is possible to estimate adenylate cyclase levels using a fairly straightforward ELISA assay. This would provide more convincing evidence of the causal mechanism of the alleles identified.

      We thank the reviewer for the comment, and we agree that a functional study of the CYR1 alleles would provide more convincing evidence for the causal mechanism of the connection between cell cycle occupancy, cAMP levels, and growth. However, we believe that the proposed experiments are beyond the scope of our current study. The evidence we provide is sufficient to establish that CYR1 is a strong candidate gene for the eQTL hotspot.

      Re: CYR1 candidate QTL -- The authors should reference the work of [Patrick Van Dijck] (https://pubmed.ncbi.nlm.nih.gov/?sort=date&term=Van+Dijck+P&cauthor_id= 20924200) and [Johan M Thevelein] (https://pubmed.ncbi.nlm.nih.gov/?sort=date&term=Thevelein+JM&cauth or_id=20924200) on CYR1 allelic variation, and other papers besides the Matsumoto/ Ishikawa papers, as the effects of cAMP-PKA signaling on stress can be quite variable. cAMP pathway variants, including in CYR1, have popped up in quite a few other yeast QTL mapping and experimental evolution papers. These should be referenced as well.

      We thank the reviewer for these references; we have added a comment about the relationship between stress tolerance and CYR1 variation, and cited the relevant references accordingly.

      Figure S10 - the subfigure showing the frequency of the GPA 82R compared to 82W suggests a fairly large and deleterious fitness effect of this allele; on the order of 7-8% fewer cells per cell cycle stage than the 82W allele. Can the authors reconcile this with the more modest growth rate effect they report on page 8?

      Figure S12C displays the allele frequency of the 82R allele across the cell cycle in the single-cell data from allele-replacement strains. These strains were grown separately and processed using two individual 10x chromium runs. The resulting sequenced library had 11,695 cells with the 82R allele and 14,894 cells with the 82W allele. The 7-8% difference in the number of cells is due to slight differences in the number of captured cells per run, not due to growth differences, because we attempted to pool cells in equal numbers from separate mid-log cultures.

      The proportion of cells in G1 increases by ~3% in strains with the 82R allele relative to the baseline proportion of cells in the experiment, which, to the reviewers point, is still larger than the ~1% growth difference we observed. Cell cycle occupancy is a compositional phenotype. As shown in figure S12C, the 82R variant increases the fraction of cells in G1 and slightly decreases the fraction of cells in M/G1. There is no obvious expectation for quantitatively translating a change in cell cycle occupancy to a change in growth rate.

      The authors refer to the Lang et al. 2009 paper w/respect to GPA1 variant S469I but that paper seems to have explored a different GPA1 allele, GPA1-G1406T, with respect to growth rates.

      We thank the reviewer for their comment. The S469I variant is the same as the G1406T variant, one denoting the amino acid change at position 469 in the protein and the other denoting the corresponding nucleotide change at position 1406 in the DNA coding sequence. We have altered the text to make this clear to the reader.

      Reviewer #2 (Recommendations For The Authors):

      I make no recommendations as to additional work for the authors. The manuscript is complete. I suggested some things I would like to see in my review, but it's up to them to decide whether they think any of those would further enhance the manuscript.

      However, I do have I have some pedantic formatting notes:

      - Microliters are variously presented as uL, ul, and µl - it should be µL

      - Similarly, milliliters are presented as ml and ML - it should be mL

      - Also, there should be a space between the number and the unit, e.g. 10 µL

      - Some gene names in the manuscript are not italicized in all instances, e.g., GPA1

      We thank the reviewer for these formatting suggestions, we have made these changes throughout the text.

    2. eLife assessment

      This manuscript describes the mapping of natural DNA sequence variants that affect gene expression and its noise, as well as cell cycle timing, using as input single-cell RNA-sequencing of progeny from crosses between wild yeast strains. The method represents an important advance in the study of natural genetic variation. The findings, especially given the follow-up validation of the phenotypic impact of a mapped locus of major effect, provide convincing support for the rigor and utility of the method.

    3. Reviewer #1 (Public Review):

      The authors demonstrate that it is possible to carry out eQTL experiments for the model eukaryote S. cerevisiae, in "one pot" preparations, by using single-cell sequencing technologies to simultaneously genotype and measure expression. This is a very appealing approach for investigators studying genetic variation in single-celled and other microbial systems, and will likely inspire similar approaches in non-microbial systems where comparable cell mixtures of genetically heterogeneous individuals could be achieved.

      While eQTL experiments have been done for nearly two decades (the corresponding author's lab are pioneers in this field), this single-cell approach creates the possibility for new insights about cell biology that would be extremely challenging to infer using bulk sequencing approaches. The major motivating application shown here is to discover cell occupancy QTL, i.e. loci where genetic variation contributes to differences in the relative occupancy of different cell cycle stages. The authors dissect and validate one such cell cycle occupancy QTL, involving the gene GPA1, a G-protein subunit that plays a role in regulating the mating response MAPK pathway. They show that variation at GPA1 is associated with proportional differences in the fraction of cells in the G1 stage of the cell cycle. Furthermore, they show that this bias is associated with differences in mating efficiency.

    4. Reviewer #2 (Public Review):

      Boocock and colleagues present an approach whereby eQTL analysis can be carried out by scRNA-Seq alone, in a one-pot-shot experiment, due to genotypes being able to be inferred from SNPs identified in RNA-Seq reads. This approach obviates the need to isolate individual spores, genotype them separately by low-coverage sequencing, and then perform RNA-Seq on each spore separately. This is a substantial advance and opens up the possibility to straightforwardly identify eQTLs over many conditions in a cost-efficient manner. Overall, I found the paper to be well-written and well-motivated, and have no issues with either the methodological/analytical approach (though eQTL analysis is not my expertise), or with the manuscript's conclusions.

    1. eLife assessment

      This important study investigates neurobiological mechanisms underlying the maintenance of stable, functionally appropriate rhythmic motor patterns during changing environmental conditions - temperature in this study in the crab Cancer borealis stomatogastric central neural pattern generating circuits producing the rhythmic pyloric motor pattern, which is naturally subjected to temperature perturbations over a substantial range. The authors present compelling evidence that the neuronal hyperpolarization-activated inward current (Ih), known to contribute to rhythm control, plays a vital role in the ability of these circuits to appropriately adjust the frequency of rhythmic neural activity in a smooth monotonic fashion while maintaining the relative timing of different phases of the activity pattern that determines proper functional motor coordination transiently and persistently to temperature perturbations. This study will be of interest to neurobiologists studying rhythmic motor circuits and systems and their physiological adaptations.

    2. Reviewer #1 (Public review):

      Summary:

      This important study investigates the neurobiological mechanisms underlying the stable operation and maintenance of functionally appropriate rhythmic motor patterns during changing environmental conditions - temperature in this study in the crab Cancer borealis stomatogastric neural pattern generating network producing the pyloric motor rhythm, which is naturally subjected to temperature perturbations over a substantial range. This study is relevant to the general problem that some rhythmic motor systems adjust to changing environmental conditions and state changes by increasing the cycle frequency in a smooth monotonic fashion while maintaining the relative timing of different network activity pattern phases that determine proper motor coordination. How this is achieved mechanistically in complex dynamic motor networks is not understood, particularly how the frequency and phase adjustments are achieved as conditions change while avoiding operational instabilities on different time scales. The authors specifically studied the contributions of the hyperpolarization-activated inward current (Ih), which is involved in rhythm control, to the adjustments of frequency and phases in the pyloric rhythmic pattern as the temperature was altered from 11 degrees C to 21 degrees C. They present compelling evidence that this current is a critical biophysical feature in the ability of this system to adjust transiently and persistently to temperature perturbations appropriately. After blocking Ih in the pyloric network with cesium, the network was unable to reliably produce its characteristic rapid and smooth increase in the frequency of the triphasic rhythmic motor pattern in response to increasing temperature or its typical steady-state increase in frequency over this Q10 temperature range.

      Strengths:

      (1) The authors addressed this problem by technically rigorous experiments in the crab Cancer borealis stomatogastric ganglion (STG) in vitro, which readily allows for neuronal activity recording in a behaviorally and architecturally defined rhythmic neural circuit in conjunction with the application of blockers of Ih and synaptic receptors to disrupt circuit interactions. This approach is an effective way to experimentally investigate how complex rhythmic networks, at least in poikilotherms, mechanistically adjust to environmental perturbations such as temperature.

      (2) While previous work demonstrated that Ih increases in pyloric neurons as temperature increases, the authors here establish that this increase is necessary for normal responses of STG neural activity to temperature, which consist of a smooth monotonic increase in the frequency of rhythmic activity with increasing temperature.

      (3) The data shows that blocking Ih with cesium causes the frequency to transiently decrease ("jags") when the temperature increases and then increases after the temperature stabilizes at a steady state, revealing a non-monotonic frequency response to temperature perturbations.

      (4) The authors dissect some of the underlying neuronal and circuit dynamics, presenting evidence that after blocking Ih, the non-monotonic jags in the frequency response are mediated by intrinsic properties of pacemaker neurons, while in the steady state, Ih determined the overall frequency change (i.e., temperature sensitivity) through network interactions.

      (5) The authors' results highlight more complex dynamic responses to increasing temperature for the first time, suggesting a longer timescale process than previously recognized that may result from interactions between multiple channels and/or ion channel kinetics.

      Weaknesses:

      (1) The involvement of Ih in achieving the frequency and phase adjustments as conditions change and allowing smooth transitions to avoid operational instabilities in other complex rhythmic motor networks, for example, in homeotherms, is not established, so the present results may have limited general extrapolations.

    3. Reviewer #2 (Public review):

      Summary:

      Using the crustacean stomatogastric nervous system (STNS), the authors present an interesting study wherein the contribution of the Ih current to temperature-induced changes in the frequency of a rhythmically active neural circuit is evaluated. Ih is a hyperpolarization-activated cation current that depolarizes neurons. Under normal conditions, increasing the temperature of the STNS increases the frequency of the spontaneously active pyloric rhythm. Notably, under normal conditions, as temperature systematically increases, the concomitant increase in pyloric frequency is smooth (i.e., monotonic). By contrast, blocking Ih with extracellular cesium produces temperature-induced pyloric frequency changes that follow a characteristic sawtooth response (i.e., non-monotonic). That is, in cesium, increasing temperature initially results in a transient drop in pyloric frequency that then stabilizes at a higher frequency. Thus, the authors conclude that Ih establishes a mechanism that ensures smooth changes in neural network frequency during environmental disturbances, a feature that likely bestows advantages to the animal's function.

      The study describes several surprising and interesting findings. In general, the study's primary observation of the cesium-induced sawtooth response is remarkable. To my knowledge, this type of response has not yet been described in neurobiological systems, and I suspect that the unexpected response will be of interest to many readers.

      At first glance, I had some concerns regarding the use of extracellular cesium to understand network phenomena. Yes, extracellular cesium blocks Ih. But extracellular cesium has also been shown to block astrocytic potassium channels, at least in mammalian systems (i.e., K-IR, PMID: 10601465), and such a blockade can elevate extracellular potassium. I was heartened to see that the authors acknowledge the non-specificity of cesium (lines 320-325) and I agree with the authors' contention that "a first approximation most of the effects seen here can likely be attributed to Cs+ block of Ih". Upon reflecting on the potential confound, I was also reassured to see that extracellular cesium alone does not in fact increase pyloric frequency, an effect that might be expected if cesium indirectly raises [K+]outside. If the authors agree, then I suggest including that point in their discussion.

      In summary, the authors present a solid investigation of a surprising biological phenomenon. In general, my comments are fairly minor. Thanks for contributing an interesting study.

      Strengths:

      A major strength of the study is the identification of an ionic conductance that mediates stable, monotonic changes in oscillatory frequency that accompany changes in the environment (i.e., temperature).

      Weaknesses:

      A potential experimental concern stems from the use of extracellular cesium to attribute network effects specifically to Ih. Previous work has shown that extracellular cesium also blocks inward-rectifier potassium channels expressed by astrocytes, and that such blockade may also elevate extracellular potassium, an action that generally depolarizes neurons. Notably, the authors address this potential concern in the discussion.

    4. Reviewer #3 (Public review):

      Summary:

      This paper presents a systematic analylsis of the role of the hyperpolarization-activated inward current (the h current) in the response of the pyloric rhythm of the stomatogastric ganglion (STG) of the crab. In a detailed set of experiments, they analyze the effect of blocking h current with bath infusion of the h current blocker cesium (perfused as CsCl). They show interesting and reproducible effects that blockade of h current results in a period of frequency decrease after an upward step in temperature, followed by a slow increase in frequency. This contrasts with the normal temperature response that shows an increase in frequency with an increase in temperature without a downward "jag" in the frequency response. This is an important paper for showing the role of h current in stabilizing network dynamics in response to perturbations such as a temperature change.

      Strengths of the paper:

      The major effects are shown very clearly and convincingly in a range of experiments with combined intracellular recording from neurons during changes in temperature.

      Weaknesses

      The Marder lab has detailed models of the pyloric rhythm. These temperature effects have not yet been modeled and could be the focus of future modeling studies.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Response to Public Reviews:

      We thank the reviewers for their kind comments have implemented many of the suggestion their suggestions. Our paper has greatly benefited from their advice.  Like Reviewer 1, we acknowledge that while the exact involvement of Ih in allowing smooth transitions is likely not universal across all systems, our demonstration of the ways in which such currents can affect the dynamics of the response of complex rhythmic motor networks provides valuable insight. To address the concerns of Reviewer 2, we included a sentence in the discussion to highlight the fact that cesium neither increased the pyloric frequency nor caused consistent depolarization in intracellular recordings. We also highlighted that these observations suggest both that cesium is not indirectly raising [K+]outside and support the conclusion that the effects of cesium are primarily through blockade of Ih rather than other potassium channels.

      Reviewer 3 raised some important points about modeling. While the lab has models that explore the effects of temperature on artificial triphasic rhythms, these models do not account for all the biophysical nuances of the full biological system. We have limited data about the exact nature of temperature-induced parameter changes and the extent to which these changes are mediated by intrinsic effects of temperature on protein structure versus protein interactions/modification by processes such as phosphorylation. With respects to the A current, Tang et al., 2010 reported that the activation and inactivation rates are differentially temperature sensitive but we do not have the data to suggest whether or not the time courses of such sensitivities are different. As such, we focus our discussion on the properties we know are modulated by temperature, i.e. activation rates. Within the discussion we now include the suggestion that future, more comprehensive modeling may be appropriate to further elucidate the ways in which reducing Ih may produce the here reported experimentally observed effects.

      Reviewer #1 (Recommendations For The Authors):

      Suggested revisions:

      A figure showing examples of the voltage-clamp traces for the critical measurements of the extent of Ih block by 5 mM CsCl in PD and LP neurons at the temperature extremes in these preparations is not shown, and the authors should consider including such a figure, perhaps as a supplemental figure.

      We have added Supplemental Figure 1 containing voltage-clamp traces demonstrating the extent of Ih block by 5mM CsCl in PD and LP neurons at 11 and 21°C.  Due to technical concerns, different preparations were used in the measurements at 11°C and 21°C, but the point that the H-current is reduced is demonstrated in all cases.

      Reviewer #2 (Recommendations for The Authors):

      Specific (Minor) Comments:

      (1) Line 83: In Cs+ "at 11°C, the pyloric frequency was significantly decreased compared to control conditions (Saline: 1.2± 0.2 Hz; Cs+ 0.9± 0.2 Hz)".

      As above, the authors often report that cesium generally reduces pyloric frequency. Figure 5A demonstrates this action quite nicely. However, cesium's effect on pyloric frequency at 11°C seems less robust in Figure 1C. Why the discrepancy?

      There is variability in the effects of Cs+ on the pyloric frequency.  As noted, the standard deviation in frequency in both conditions is 0.2Hz.  As such, there are some cases in which the initial frequency drop in Cs+ compared to control was relatively small.  1C is one such case, but was selected as an example because of its clear reduction in temperature sensitivity. 

      (2) I don't understand what the arrows/dashed lines are trying to convey in Figure 3C.

      The arrows/dashed lines represent the criteria used to define a cycle as “decreasing in frequency” (Temperature Increasing) or “increasing in frequency” (Temperature Stable).  We have amended lines 130 and 137 in the text to hopefully clarify this point, as well as the figure legend.

      (3) Lines 118/168. The description of cesium's specific action on the depolarizing portion of PD activity is a bit confusing. In my mind, "depolarization phase" refers to the point at which PD is most depolarized. Perhaps restating the phrase to "elongation of the depolarizing trajectory" is less confusing. The authors may also want to consider labeling this trajectory in Figure 2C.

      We have changed “depolarization phase” to “depolarizing phase” to highlight that this is the period during which the cell is depolarizing, rather than at its most depolarized.  We consider the plateau of the slow wave and spiking (the point at which PD is most depolarized) to be the “bursting phase”.  We have labeled these phases in Figure 2C as suggested.

      (4) Figure 3C legend: a few words seem to be missing. I suggest "the change in mean frequency was more likely TO decrease IN Cs+ than in saline".

      Thank you for catching this typo, it has been corrected.

      (5) Line 165: Awkward phrasing. “In one experiment, the decrease in frequency while temperature increased and subsequent increase in frequency after temperature stabilized was particularly apparent in Cs+ PTX”.

      How about: “One Cs+ PTX experiment wherein elevating the temperature transiently decreased pyloric frequency is shown in Figure 4F.”

      We have amended this sentence to read, “One Cs++PTX experiment in which elevating the temperature produced a particularly pronounced transient decrease in frequency is shown in Figure 4F.”

      (6) Line 186: Awkward phrasing. "LP OFF was also significantly advanced in Cs+, although duty cycle (percent of the period a neuron is firing) was preserved".

      The use of the word "although" seems a bit strange. If both LP onset and LP offset phase advance by the same amount, then isn't an unchanged duty cycle expected?

      “Although” has been changed to “and subsequently”.

      Reviewer #3 (Recommendations For The Authors):

      Major comments:

      (1) I know the Marder lab has detailed models of the pyloric rhythm. I am not saying they have to add modeling to this already extensive and detailed paper, but it would be useful to know how much of these temperature effects have been modeled, for example in the following locations.

      (2) Line 259 - "Mathematically..." - Is there a computational model of H current that has shown this decrease in frequency in pyloric neurons? If you are working on one for the future, you could mention this.

      There is not currently a model in which the reduction of the H-current results in the non-minimum phase dynamics in the frequency response to temperature seen experimentally. It should be noted that our existing models of pyloric activity responses to temperature are not well suited to investigate such dynamics in their current iterations.  Further work is necessary to demonstrate the principles observed experimentally in computational modeling, and we have added a sentence to the paper to reflect this point (Line 268).

      (3) Line 318 - "therefore it remains unclear" - I thought they had models of the circuit rhythmicity. Do these models include temperature effects? Can they comment on whether their models of the circuit show an opposite effect to what they see in the experiment? I'm not saying they have to model these new effects as that is probably an entirely different paper, but it would be interesting to know whether current models show a different effect.

      We have some models of the pyloric response to temperature, but these models were specifically selected to maintain phase across the range of temperature.  When Ih was reduced in these models, a variety of effects on phase and duty cycle were seen.  These models were selected to have the same key features of behavior as the pyloric rhythm, but do not capture all the biophysical nuances of the complete system, and therefore should not necessarily be expected to reflect the experimental findings in their current iterations.  Furthermore, these models are meant to have temperature as a static, rather than dynamic input, and thus are ill-suited to examine the conditions of our experiments.  The models in their current state are not sufficiently relevant to these experimental findings that we they can illuminate the present paper `2.

      (4) "If deinactivation is more accelerated or altered by temperature than inactivation...While temperature continued to change, the difference in parameters would continue to grow" - This is described as a difference in temperature sensitivity, but it seems like it is also a function of the time course of the response to change in temperature (i.e. the different components could have the same final effect of temperature but show a different time course of the change).

      We know from Tang et al, 2010, that activation and inactivation rates of the A current are differentially temperature sensitive. We have no evidence to suggest that the time course of the response to temperature of various parameters differ.  The physical actions of temperature on proteins are likely to be extremely rapid, making a time course difference on the order of tens of seconds less unlikely, though not impossible. Modeling of the biophysics might illuminate the relative plausibility of these different mechanisms of action, but we feel that our current suggested explanation is reasonable based on existing information.

      (5) Is it known how temperature is altering these channel kinetics? Is it via an intrinsic rearrangement of the protein structure, or is it a process that involves phosphorylation (that could explain differences in time course?). Some mention of the mechanism of temperature changes would be useful to readers outside this field.

      It is not known exactly how temperature alters channel parameters.  Invariably some, if not all, of it is due to an intrinsic rearrangement of protein structure, and our current models treat all parameter changes as an instantaneous consequence.  However, it is possible that some effects of temperature are due to longer timescale processes such as phosphorylation or cAMP interactions.  Current work in the lab is actively exploring these questions, but there is no definitive answer. Given that this paper focuses on the phenomenon and plausible biomolecular explanations based on existing data, we have not altered the paper to include more exhaustive  coverage of all the possible avenues by which temperature may alter channel properties.

      Specific comments:

      Title: misspelling of "Cancer" ?

      We are unsure how that extra “w” got into the earliest version of the manuscript and have removed it.

      Line 66 "We used 5mM CsCl" - might mention right up front that this was a bath application of the substance.

      We have altered this line to read “used bath application of 5mM CsCl”.  

      Figure 4 - "The only feedback synapse to the pacemaker kernel neurons, LP to PD, and is blocked by picrotoxin" - I think the word "and" should be removed from this phrase in the figure legend.

      Fixed

      Figure 4 legend - "Reds denote temperature...yellows denote..." - I think it should be "Red dots denote temperature...yellow dots denote...".

      Done

      Figure 4B - Why does the change in frequency in cesium look so different in Figure 4B compared to Figure 1C or Figure 3B? In the earlier figures, the increase of frequency is smaller but still present in cesium, whereas, in Figure 4B, cesium seems to completely block the increase in frequency. I'm not sure why this is different, but I guess it's because 3B and 4B are just mean traces from single experiments. Presumably, 4B is showing an experiment in which the cesium was subsequently combined with picrotoxin?

      Figures 1C, 3B, and 4B are indeed all from different single experiments. As acknowledged in our concluding paragraph, there was substantial variability in the exact response of the pyloric rhythm to temperature while in cesium.  The most consistent effect was that the difference in frequency between cesium and saline at a particular temperature increased, as demonstrated across 21 preparations in Figure 1D. It may be noted in Figure 1E that the Q10 was not infrequently <1, meaning that there was a net decrease in frequency as temperature increased in some experiments such as seen in the example of Figure 4B.  The “fold over” (initial increase in steady-state frequency with temperature, then decrease at higher temperatures) has been observed at higher temperatures (typically around 23-30 degrees C) even under control conditions but has not been highlighted in previous publications.  The example in 4B was chosen because it demonstrated both the similarity in jags between Cs+ and Cs++PTX and an overall decrease in temperature sensitivity, even though in this instance the steady-state change in frequency with temperature was not monotonic. 

      Figure 6A - "Phase 0 to 1.0" - The y-axis should provide units of phase. Presumably, these are units of radians so 1.0=2*pi radians (or 360 degrees, but probably best to avoid using degrees of phase due to confusion with degrees of temperature).

      Phase, with respect to pyloric rhythm cycles, does not traditionally have units as it is a proportion rather than an angle. As such, we have not changed the figure.

      Line 275 - "the pacemaker neuron can increase" - Does this indicate that the main effects of H current are in the follower neurons (i.e. LP and PY versus the driver neuron PD)?

      Not necessarily.  We posit in the next paragraph that the effect of the H current on the temperature sensitivity could be due to its phase advance of LP, but that phase advance of LP is not particularly expected to increase frequency.  We favor the possibility that temperature increases Ih in the pacemaker, which in turn advances the PRC of the rhythm, allowing the frequency increase seen under normal conditions.  In Cs+, this advance does not occur, resulting in the lower temperature sensitivity.  In Cs++PTX, the lack of inhibition from LP means compensatory advance of the pacemaker PRC by Ih is unnecessary to allow increased frequency.

      Line 285 - "either increase frequency have no effect" - Is there a missing "or" in this phrase?

      Thank you, we have added the “or”.

    1. eLife assessment

      This important study highlights cell types preserving long-lived proteins and lays a foundation for identifying exceptionally long-lived proteins in the ovary. Convincing evidence describes helpful data about protein turnover and identifies long-lived macromolecules in oocytes and somatic cells during mouse ovarian aging. This work will be of interest to researchers working on aging and reproductive health.

    2. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer 2:

      In addition, it is still unacceptable for me that the number of ovulated oocytes in mice at 6 months of age is only one third of young mice (10 vs 30; Fig. S1E). The most of published literature show that mice at 12 months of age still have ~10 ovulated oocytes.

      We disagree with the reviewer’s comment, and the concerns raised were not shared by the other reviewers.  We have reported our data with full transparency (each data point is plotted). In the current study, we observed an intermediate phenotype in gamete number (assessed by both ovarian follicle counts and ovulated eggs) when comparing 6 month old mice to 6 week or 10 month old mice; this is as expected. It is well accepted that follicle counts are highly mouse strain dependent.  Although the reviewer mentions that mice at 12 months have ~10 ovulated oocytes, no actual references are provided nor are the mouse strain or other relevant experimental details mentioned.  Therefore, we do not know how these quoted metrics relate to the female FVB mice used in our current study.   As clearly explained and justified in our manuscript, we used mice at 6 months and 10 months to represent a physiologic aging continuum. 

      Moreover, based on the follicle counting method used in the present study (Fig. S1D), there are no antral follicles observed in mice at 6 months and 10 months of age, which is not reasonable.

      This statement is incorrect. Antral follicles were present at 6 and 10 months of age, but due to the scale of the y-axis and the normalization of follicle number/area in Fig. S1D, the values are small.  The absolute number of antral follicles per ovary (counted in every 5th section) was 31.3 ± 3.8 follicles for 6-week old mice, 9.3 ± 2.3 follicles for 6-month old mice, and 5.3 ± 1.8 follicles for 10-month old mice.  Moreover, it is important to note that these ovaries were not collected in a specific stage of the estrous cycle, so the number of antral follicles may not be maximal.  In addition, as described in the Materials and Methods, antral follicles were only counted when the oocyte nucleus was present in a section to avoid double counting.  Therefore, this approach (which was applied consistently across samples) could potentially underestimate the total number.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Bomba-Warczak describes a comprehensive evaluation of long-lived proteins in the ovary using transgenerational radioactive labelled 15N pulse-chase in mice. The transgenerational labeling of proteins (and nucleic acids) with 15N allowed the authors to identify regions enriched in long-lived macromolecules at the 6 and 10-month chase time points. The authors also identify the retained proteins in the ovary and oocyte using MS. Key findings include the relative enrichment in long-lived macromolecules in oocytes, pregranulosa cells, CL, stroma, and surprisingly OSE. Gene ontology analysis of these proteins revealed enrichment for nucleosome, myosin complex, mitochondria, and other matrix-type protein functions. Interestingly, compared to other post-mitotic tissues where such analyses have been previously performed such as the brain and heart, they find a higher fractional abundance of labeled proteins related to the mitochondria and myosin respectively.

      Response: We thank the reviewer for this thoughtful summary of our work.  We want to clarify that our pulse-chase strategy relied on a two-generation stable isotope-based metabolic labelling of mice using 15N from spirulina algae (for reference, please see (Fornasiero & Savas, 2023; Hark & Savas, 2021; Savas et al., 2012; Toyama et al., 2013)).  We did not utilize any radioactive isotopes.

      Strengths:

      A major strength of the study is the combined spatial analyses of LLPs using histological sections with MS analysis to identify retained proteins.

      Another major strength is the use of two chase time points allowing assessment of temporal changes in LLPs associated with aging.

      The major claims such as an enrichment of LLPs in pregranulosa cells, GCs of primary follicles, CL, stroma, and OSE are soundly supported by the analyses, and the caveat that nucleic acids might differentially contribute to this signal is well presented.

      The claims that nucleosomes, myosin complex, and mitochondrial proteins are enriched for LLPs are well supported by GO enrichment analysis and well described within the known body of evidence that these proteins are generally long-lived in other tissues.

      Weaknesses:

      Comment 1: One small potential weakness is the lack of a mechanistic explanation of if/why turnover may be accelerating at the 6-10 month interval compared to 1-6.

      Response 1: At the 6-month time point, we detected more long lived proteins than the 10 month time point in both the ovary and the oocyte.  We anticipated this because proteins are degraded over time, and substantially more time has elapsed at the later time point.  Moreover, at the 6–10-month time point, age-related tissue dysfunction is already evident in the ovary.  For example, in 6-9 month old mice, there is already a deterioration of chromosome cohesion in the egg which results in increased interkinetochore distances (Chiang et al., 2010), and by 10 months, there are multinucleated giant cells present in the ovarian stroma which is consistent with chronic inflammation (Briley et al., 2016).  Thus, the observed changes in protein dynamics may be another early feature of aging progression in the ovary.  

      Comment 2: A mild weakness is the open-ended explanation of OSE label retention. This is a very interesting finding, and the claims in the paper are nuanced and perfectly reflect the current understanding of OSE repair. However, if the sections are available and one could look at the spatial distribution of OSE signal across the ovarian surface it would interesting to note if label retention varied by regions such as the CLs or hilum where more/less OSE division may be expected. 

      Response 2: We agree that the enrichment of long-lived molecules in the OSE is interesting. To make interpretable conclusions about the dynamics of long-lived molecules in the OSE, we would need to generate a series of samples at precise stages of the estrous cycle or ideally across a timecourse of ovulation to capture follicular rupture and repair.  These samples do not currently exist and are beyond the scope of this study. However, this idea is an important future direction and it has been added to the discussion (lines 221-223). Furthermore, from a practical standpoint, MIMS imaging is resource and time intensive. Thus, we are not able to readily image entire ovarian sections.  Instead, we focused on structures within the ovary and took select images of follicles, stroma, and OSE.  We, therefore, do not have a comprehensive series of images of the OSE from the entire ovarian section for each mouse analyzed.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Bomba-Warczak et al. applied multi-isotope imaging mass spectrometry (MIMS) analysis to identify the long-lived proteins in mouse ovaries during reproductive aging, and found some proteins related to cytoskeletal and mitochondrial dynamics persisting for 10 months.

      Response: We thank the reviewer for their summary and feedback.

      Strengths:

      The manuscript provides a useful dataset about protein turnover during ovarian aging in mice.

      Weaknesses:

      Comment 1: The study is pretty descriptive and short of further new findings based on the dataset. In addition, some results such as the numbers of follicles and ovulated oocytes in aged mice are not consistent with the published literature, and the method for follicle counting is not accurate. The conclusions are not fully supported by the presented evidence.

      Response 1: We agree with the reviewer that this study is descriptive. Our goal, as stated, was to use a discovery-based approach to define the long-lived proteome of the ovary and oocyte across a reproductive aging continuum.  As the prominent aging researcher, Dr. James Kirkland, stated: “although ‘descriptive’ is sometimes used as a pejorative term…descriptive or discovery research leading to hypothesis generation has become highly sophisticated and of great relevance to the aging field (Kirkland, 2013).”  We respectfully disagree with the reviewer that our study is short of new findings. In fact, this is the first time that a stable two-generation stable isotope-based metabolic labelling of mice in combination with two different state-of-the-art mass spectrometry methods has been used to identify and localize long lived molecules in the ovary and oocyte along this particular reproductive aging continuum in an unbiased manner.  We have identified proteins groups that were previously not known to be long lived in the ovary and oocyte.  Our hope is that this long-lived proteome will become an important hypothesis-generating resource for the field of reproductive aging.

      The age-dependent decline in number of follicles and eggs ovulated in mice has been well established by our group as well as others (Duncan et al., 2017; Mara et al., 2020).  Thus, we are unclear about the reviewer’s comments that our results are not consistent with the published literature.  The absolute numbers of follicles and eggs ovulated as well as the rate of decline with age are highly strain dependent.  Moreover, mice can have a very small ovarian reserve and still maintain fertility (Kerr et al., 2012).  In our study, we saw a consistent age-dependent decrease in the ovarian reserve (Figure 1 – figure supplement 1 D), the number of oocytes collected from large antral follicles following hyperstimulation with PMSG (used for LC-MS/MS), and the number of eggs collected from the oviduct following hyperstimulation and superovulation with PMSG and hCG (Figure 1 – figure supplement 1 E and F).  In all cases, the decline was greater in 10 month old compared to 6 month old mice demonstrating a relative reproductive aging continuum even at these time points.

      Our research team has significant expertise in follicle classification and counting as evidenced by our publication record (Duncan et al., 2017; Kimler et al., 2018; Perrone et al., 2023; Quan et al., 2020).  We used our established methods which we have further clarified in the manuscript text (lines 395-397).  Follicle counts were performed on every 5th tissue section of serial sectioned ovaries, and 1 ovary from 3 mice per timepoint were counted. Therefore, follicle counts were performed on an average of 48-62 total sections per ovary. The number of follicles was then normalized per total area (mm2) of the tissue section, and the counts were averaged. Figure 1 – figure supplement 1 C and D represents data averaged from all ovarian sections counted per mouse.   It is important to note that the same criteria were applied consistently to all ovaries across the study, and thus regardless of the technique used, the relative number of follicles or oocytes across ages can be compared.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Bomba-Warczak et al focused on reproductive aging, and they presented a map for long-lived proteins that were stable during reproductive lifespan. The authors used MIMS to examine and show distinct molecules in different cell types in the ovary and tissue regions in a 6 month mice group, and they also used proteomic analysis to present different LLPs in ovaries between these two timepoints in 6-month and 10-month mice. The authors also examined the LLPs in oocytes in the 6-months mice group and indicated that these were nuclear, cytoskeleton, and mitochondria proteins.

      Response: We thank the reviewer for their summary and feedback.

      Strengths:

      Overall, this study provided basic information or a 'map' of the pattern of long-lived proteins during aging, which will contribute to the understanding of the defects caused by reproductive aging.

      Weaknesses:

      Comment 1: The 6-month mice were used as an aged model; no validation experiments were performed with proteomics analysis only.  

      Response 1:  We did not select the 6-month time point to be representative of the “aged model” but rather one of two timepoints on the reproductive aging continuum – 6 and 10 months.  In the manuscript (Figure 1 – figure supplement 1) we have demonstrated the relevance of the two timepoints by illustrating a decrease in follicle counts, number of fully grown oocytes collected, and number of eggs ovulated as well as a tendency towards increased stromal fibrosis (highlighted in the main text lines 78-85).  Inclusion of the 6-month timepoint ultimately turned out to be informative and essential as many long-lived proteins were absent by the 10 month timepoint. These results suggest that important shifts in the proteome occur during mid to advanced reproductive age.  The relevance of these timepoints is mentioned in the discussion (lines 247-270).

      Two independent mass spectrometry approaches (MIMS and LC-MS/MS) were used to validate the presence of long-lived macromolecules in the ovary and oocyte. Studies focused on the role of specific long-lived proteins in oocyte and ovarian biology as well as how they change with age in terms of function, turnover, and modification are beyond the scope of the current study but are ongoing.  We have acknowledged these important next steps in the manuscript text (lines 286-288, 311-312).

      It is important to note, that oocytes are biomass limited cells, and their numbers decrease with age.  Thus, we had to select ages where we could still collect enough from the mice available to perform LC-MS/MS. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Comment 1: The writing and figures are beautiful - it would be hard to improve this manuscript.

      Response 1: We greatly appreciate this enthusiastic evaluation of our work.

      Comment 2: In Fig S1E/F it would help to list the N number here. Why are there 2 groups at 6-12 wk?

      Response 2:  We did not have 6 month and 10-month-old mice available at the same time to be able to run the hyperstimulation and superovulation experiment in parallel.  Therefore, we performed independent experiments comparing the number of eggs collected from either 6-month-old or 10 month old mice relative to 6-12 week old controls.  In each trial, eggs were collected from pooled oviducts from between 3-4 mice per age group, and the average total number of eggs per mouse was reported.  Each point on the graph corresponds to the data from an individual trial, and two trials were performed.  This has been clarified in the figure legend (lines 395-397).  Of note, while addressing this reviewer’s comments, we noticed that we were missing Materials and Methods regarding the collection of eggs from the oviduct following hyperstimulation and superovulation with PMSG and hCG.  This information has now been added in Methods Section, lines 477-481.

      Comment 3: The manuscript would benefit from an explanation of why the pups were kept on a 1-month N15 diet after birth, since the oocytes are already labeled before birth, and granulosa at most by day 3-4. Would ZP3 have not been identified otherwise?

      Response 3:   The pups used in this study were obtained from fully labeled female dams that were maintained on an15N diet.  These pups had to be kept with their mothers through weaning.  To limit the pulse period only through birth, the pups would have had to be transferred to unlabeled foster mothers.  However, this would have risked pup loss which would have significantly impacted our ability to conduct the studies given that we only had 19 labeled female pups from three breeding pairs.  We have clarified this in the manuscript text in lines 78-80.  It is hard to know, without doing the experiment, whether we would have detected ZP3 if we only labeled through birth.  The expression of ZP3 in primordial follicles, albeit in human, would suggest that this protein is expressed quite early in development.

      Comment 4: What is happening to the mitochondria at 6-10 months? Does their number change in the oocyte? Is there a change in the rate of fission? Any chance to take a stab at it with these or other age-matched slides?

      Response 4:  The reviewer raises an excellent point.  As mentioned previously in the Discussion (lines 290-301), there are well documented changes in mitochondrial structure and function in the oocyte in mice of advanced reproductive age.  However, there is a paucity of data on the changes that may happen at earlier mid-reproductive age time points.  From the oocyte mitochondrial proteome perspective, our data demonstrate a prominent decline in the persistence of long-lived proteins between 6 and 10 months, and this occurs in the absence of a change in the total pool of mitochondrial proteins (both long and short lived populations) as assessed by spectral counts or protein IDs (figure below).  These data, which we have added into Figure 3 – figure supplement 1 and in the manuscript text (lines 164-170) are suggestive of similar numbers of mitochondria at these two timepoints. It would be informative to do a detailed characterization of oocyte mitochondrial structure and function within this window to see if there is a correlation with this shift in long lived mitochondrial proteins.  Although this analysis is beyond the scope of the current manuscript, it is an important next line of inquiry which we have highlighted in the manuscript text (lines 255-257 and 311-312).

      Reviewer #2 (Recommendations For The Authors):

      Several concerns are raised as shown below.

      Comment 1: In Fig. 2F, it is surprising that ZP3 disappeared in the ovary from mice at the age of 10 months by MIMS analysis, because quite a few oocytes with intact zona pellucida can still be obtained from mice at this age. Notably, ZP would not be renewed once formed.

      Response 1: To clarify, Figure 2F shows LC-MS/MS data and not MIMS data.  As mentioned in the Discussion, the detection of long-lived pools of ZP3 at 6 months cannot be derived from newly synthesized zona pellucidae in growing follicles because they would not have been present during the pulse period.  The only way we could detect ZP3 at 6 months is if it forms a primitive zona scaffold in the primordial follicle or if ZPs from atretic follicles of the first couple of waves of folliculogenesis incorporate into the extracellular matrix of the ovary.  The lack of persistence of ZP3 at 10 months could be due to protein degradation. Should ZP3 indeed form a primitive zona, its loss at 10 months would be predicted to result in poor formation of a bona fide zona pellucida upon follicle growth.  Interestingly, aging has been associated with alterations in zona pellucida structure and function.   These data open novel hypotheses regarding the zona pellucida (e.g. a primitive zona scaffold and part of the extracellular matrix) and will require significant further investigation to test. These points are highlighted in the Discussion lines 227-245.

      Comment 2: To determine whether those proteins that can not be identified by MIMS at the time point of 10 months are degraded or renewed, the authors should randomly select some of them to examine their protein expression levels in the ovary by immunoblotting analysis.

      Response 2: To clarify, proteins were identified by LC-MS/MS and not MIMS which was used to visualize long lived macromolecules.   Each protein will be comprised of old pools (15N containing) and newly synthesized pools (14N containing).  Degradation of the old pool of protein does not mean that there will be a loss of total protein.  Moreover, immunoblotting cannot distinguish old and newly synthesized pools of protein. Where overall peptide counts are listed for each protein identified at both time points.  As peptides derive from proteins, the table provided with the manuscript reflects what immunoblotting would, but on a larger and more precise scale.

      Comment 3: I think those proteins that can be identified by MIMS at the time point of 6 months but not 10 months deserve more analyses as they might be the key molecules that drive ovarian aging.

      Response 3:  This comment conflicts with comment 2 from Reviewer #3 (Recommendations For The Authors).  This underscores that different researchers will prioritize the value and follow up of such rich datasets differently.  We agree that the LLP identified at 6 months are of particular interest to reproductive aging, and we are planning to follow up on these in future studies.

      Comment 4:  Figure 1 – figure supplement 1 C-F, compared with the published literature, the numbers of follicles at different developmental stages and ovulated oocytes at both ages of 6 months and 10 months were dramatically low in this study. For 6-month-old female mice, the reproductive aging just begins, thus these numbers should not be expected to decrease too much. In addition, follicle counting was carried out only in an area of a single section, which is an inaccurate way, because the numbers and types of follicles in various sections differ greatly. Also, the data from a single section could not represent the changes in total follicle counts.

      Response 4: We have addressed these points in response to Comment 1 in the Reviewer #2 Public Review, and corresponding changes in the text have been noted.    

      Comment 5:  The study lacks follow-up verification experiments to validate their MIMS data.

      Response 5: Two independent mass spectrometry approaches (MIMS and LC-MS/MS) were used to validate the presence of long-lived macromolecules in the ovary and oocyte. Studies focused on the role of specific long-lived proteins in oocyte and ovarian biology as well as how they change with age in terms of function, turnover, and modification are beyond the scope of the current study but ongoing.  We have acknowledged these important next steps in the manuscript text (lines 286-288 and 311-312).

      Reviewer #3 (Recommendations For The Authors):

      Comment 1: The authors used the 6-month mice group to represent the aged model, and examined the LLPs from 1 month to 6 months. Indeed, 6-month-old mice start to show age-related changes; however, for the reproductive aging model, the most widely accepted model is that 10-month-old age mice start to show reproductive-related changes and 12-month-old mice (corresponding to 35-40 year-old women) exhibit the representative reproductive aging phenotypes. Therefore, the data may not present the typical situation of LLPs during reproductive aging.

      Response 1: As described in the response to Comment 1 in the Reviewer #3 Public Review, there were clear logistical and technical feasibility reasons why the 6 month and 10-month timepoints were selected for this study.  Importantly, however, these timepoints do represent a reproductive aging continuum as evidenced by age-related changes in multiple parameters.  Furthermore, there were ultimately very few LLPs that remained at 10 months in both the oocyte and ovary, so inclusion of the 6-month time point was an important intermediate.  Whether the LLPs at the 6-month timepoint serve as a protective mechanism in maintaining gamete quality or whether they contribute to decreased quality associated with reproductive aging is an intriguing dichotomy which will require further investigation.  This has been added to the discussion (lines 247-257).

      Comment 2:  Following the point above, the authors examined the ovaries in 6 months and 10 months mice by proteomics, and found that 6 months LLPs were not identical compared with 10 months, while there were Tubb5, Tubb4a/b, Tubb2a/b, Hist2h2 were both expressed at these two time points (Fig 2B), why the authors did not explore these proteins since they expressed from 1 month to 10 months, which are more interesting.

      Response 2:  The objective of this study was to profile the long-lived proteome in the ovary and oocyte as a resource for the field rather than delving into specific LLPs at a mechanistic level.  That being said, we wholeheartedly agree with the reviewer that the proteins that were identified at both 6 month and 10 months are the most robust and long lived and worthy of prioritizing for further study.  Interestingly, Tubb5 and Tubb4a have high homology to primate-specific Tubb8, and Tubb8 mutations in women are associated with meiosis I arrest in oocytes and infertility (Dong et al., 2023; Feng et al., 2016).  Thus, perturbation of these specific proteins by virtue of their long-lived nature may be associated with impaired function and poor reproductive outcomes.  We have highlighted the importance of these LLPs which are present at both timepoints and persist to at least 10 months in the manuscript text (lines 259-270).

      Comment 3:  The authors also need to provide a hypothesis or explanation as to why LLDs from 6 months LLPs were not identical compared with 10 months.

      Response 3:  We agree that LLDs identified at 10 months should be also identified as long-lived at 6 months. This is a common limitation of mass spectrometry-based proteomics where each sample is prepared and run individually, which introduces variability between biological replicates, especially when it comes to low abundant proteins. It is key to note that just because we do not identify a protein, it does not mean the protein is not there – it merely means that we were not able to detect it in this particular experiment, but low levels of the protein may still be there. To compensate for this known and inherent variability, we have applied stringent filtering criteria where we required long-lived peptides to be identified in an independent MS scan (alternative is to identify peptide in either heavy or light scan and use modeling to infer FA value based on m/z shift), which gave us peptides of highest confidence. Ideally, these experiments would be done using TMT (tandem mass tag) approach. However, TMT-based experiments typically require substantial amount of input (80-100ug per sample) which unfortunately is not feasible with oocytes obtained from a limited number of pulse-chased animals.  We have added this explanation to the discussion (lines 265-270).

      Comment 4:  The reviewer thinks that LLPs from 6 months to 10 months may more closely represent the long-lived proteins during reproductive aging.

      Response 4:  We fully agree that understanding the identity of LLPs between the 6 month and 10 month period will be quite informative given that this is a dynamic period when many of LLPs get degraded and thus might be key to the observed decline in reproductive aging. This is a very important point that we hope to explore in future follow-up studies.

      Comment 5: The authors used proteomics for the detection of ovaries and oocytes, however, there are no validation experiments at all. Since proteomics is mainly for screening and prediction, the authors should examine at least some typical proteins to confirm the validity of proteomics. For example, the authors specifically emphasized the finding of ZP3, a protein that is critical for fertilization.

      Response 5:  Thank you, we agree that closer examination of proteins relevant and critical for fertilization is of importance.  However, a detailed analysis of specific proteins fell outside of the scope of this study which aimed at unbiased identification of long-lived macromolecules in ovaries and oocytes. We hope to continue this important work in near future.

      Comment 6: For the oocytes, the authors indicated that cytoskeleton, mitochondria-related proteins were the main LLPs, however, previous studies reported the changes of the expression of many cytoskeleton and mitochondria-related proteins during oocyte aging. How do the authors explain this contrary finding?   

      Response 6:  Our findings are not contrary to the studies reporting changes in protein expression levels during oocyte aging – the two concepts are not mutually exclusive. The average FA value at 6-month chase for oocyte proteins is 41.3 %, which means that while 41.3% of long-lived proteins pool persisted for 6 months, the other 58.7% has in fact been renewed. With the exception of few mitochondrial proteins (Cmkt2 and Apt5l), and myosins (Myl2 and Myh7), which had FA values close to 100% (no turnover), most of the LLPs had a portion of protein pools that were indeed turned over. Moreover, we included new data analysis illustrating that we identify comparable number of mitochondrial proteins between the two time points, indicating that while the long-lived pools are changing over time, the total content remains stable (Figure 3 – figure supplement 1E-G).

      Comment 7:  The authors also should provide in-depth discussion about the findings of the current study for long-lived proteins. In this study, the authors reported the relationship between these "long-lived" proteins with aging, a process with multiple "changes". Do long-lived proteins (which are related to the cytoskeleton and mitochondria) contribute to the aging defects of reproduction? or protect against aging?

      Response 7: This is a very important comment and one that needs further exploration. The fact is – we do not know at this moment whether these proteins are protective or deleterious, and such a statement would be speculative at this stage of research into LLPs in ovaries and oocytes. Future work is needed to address this question in detail.

      Briley, S. M., Jasti, S., McCracken, J. M., Hornick, J. E., Fegley, B., Pritchard, M. T., & Duncan, F. E. (2016). Reproductive age-associated fibrosis in the stroma of the mammalian ovary. Reproduction, 152(3), 245-260. https://doi.org/10.1530/REP-16-0129

      Chiang, T., Duncan, F. E., Schindler, K., Schultz, R. M., & Lampson, M. A. (2010). Evidence that Weakened Centromere Cohesion Is a Leading Cause of Age-Related Aneuploidy in Oocytes. Current Biology, 20(17), 1522-1528. https://doi.org/10.1016/j.cub.2010.06.069

      Dong, J., Jin, L., Bao, S., Chen, B., Zeng, Y., Luo, Y., Du, X., Sang, Q., Wu, T., & Wang, L. (2023). Ectopic expression of human TUBB8 leads to increased aneuploidy in mouse oocytes. Cell Discov, 9(1), 105. https://doi.org/10.1038/s41421-023-00599-z

      Duncan, F. E., Jasti, S., Paulson, A., Kelsh, J. M., Fegley, B., & Gerton, J. L. (2017). Age-associated dysregulation of protein metabolism in the mammalian oocyte. Aging Cell, 16(6), 1381-1393. https://doi.org/10.1111/acel.12676

      Feng, R., Sang, Q., Kuang, Y., Sun, X., Yan, Z., Zhang, S., Shi, J., Tian, G., Luchniak, A., Fukuda, Y., Li, B., Yu, M., Chen, J., Xu, Y., Guo, L., Qu, R., Wang, X., Sun, Z., Liu, M., . . . Wang, L. (2016). Mutations in TUBB8 and Human Oocyte Meiotic Arrest. N Engl J Med, 374(3), 223-232. https://doi.org/10.1056/NEJMoa1510791

      Fornasiero, E. F., & Savas, J. N. (2023). Determining and interpreting protein lifetimes in mammalian tissues. Trends Biochem Sci, 48(2), 106-118. https://doi.org/10.1016/j.tibs.2022.08.011

      Hark, T. J., & Savas, J. N. (2021). Using stable isotope labeling to advance our understanding of Alzheimer's disease etiology and pathology. J Neurochem, 159(2), 318-329. https://doi.org/10.1111/jnc.15298

      Kerr, J. B., Hutt, K. J., Michalak, E. M., Cook, M., Vandenberg, C. J., Liew, S. H., Bouillet, P., Mills, A., Scott, C. L., Findlay, J. K., & Strasser, A. (2012). DNA damage-induced primordial follicle oocyte apoptosis and loss of fertility require TAp63-mediated induction of Puma and Noxa. Mol Cell, 48(3), 343-352. https://doi.org/10.1016/j.molcel.2012.08.017

      Kimler, B. F., Briley, S. M., Johnson, B. W., Armstrong, A. G., Jasti, S., & Duncan, F. E. (2018). Radiation-induced ovarian follicle loss occurs without overt stromal changes. Reproduction, 155(6), 553-562. https://doi.org/10.1530/REP-18-0089

      Kirkland, J. L. (2013). Translating advances from the basic biology of aging into clinical application. Exp Gerontol, 48(1), 1-5. https://doi.org/10.1016/j.exger.2012.11.014

      Mara, J. N., Zhou, L. T., Larmore, M., Johnson, B., Ayiku, R., Amargant, F., Pritchard, M. T., & Duncan, F. E. (2020). Ovulation and ovarian wound healing are impaired with advanced reproductive age. Aging (Albany NY), 12(10), 9686-9713. https://doi.org/10.18632/aging.103237

      Perrone, R., Ashok Kumaar, P. V., Haky, L., Hahn, C., Riley, R., Balough, J., Zaza, G., Soygur, B., Hung, K., Prado, L., Kasler, H. G., Tiwari, R., Matsui, H., Hormazabal, G. V., Heckenbach, I., Scheibye-Knudsen, M., Duncan, F. E., & Verdin, E. (2023). CD38 regulates ovarian function and fecundity via NAD(+) metabolism. iScience, 26(10), 107949. https://doi.org/10.1016/j.isci.2023.107949

      Quan, N., Harris, L. R., Halder, R., Trinidad, C. V., Johnson, B. W., Horton, S., Kimler, B. F., Pritchard, M. T., & Duncan, F. E. (2020). Differential sensitivity of inbred mouse strains to ovarian damage in response to low-dose total body irradiationdagger. Biol Reprod, 102(1), 133-144. https://doi.org/10.1093/biolre/ioz164

      Savas, J. N., Toyama, B. H., Xu, T., Yates, J. R., 3rd, & Hetzer, M. W. (2012). Extremely long-lived nuclear pore proteins in the rat brain. Science, 335(6071), 942. https://doi.org/10.1126/science.1217421

      Toyama, B. H., Savas, J. N., Park, S. K., Harris, M. S., Ingolia, N. T., Yates, J. R., 3rd, & Hetzer, M. W. (2013). Identification of long-lived proteins reveals exceptional stability of essential cellular structures. Cell, 154(5), 971-982. https://doi.org/10.1016/j.cell.2013.07.037

    3. Reviewer #1 (Public Review):

      Summary:

      This manuscript by Bomba-Warczak describes a comprehensive evaluation of long-lived proteins in the ovary using a transgenerational diet-derived 15N-labelling in pulse-chased mice. The transgenerational labeling of proteins (and nucleic acids) with 15N allowed the authors to identify regions enriched in long-lived macromolecules at the 6 and 10-month chase time points. The authors also identified the retained proteins in the ovary and oocyte using MS. Key findings include the relative enrichment in long-lived macromolecules in oocytes, pregranulosa cells, CL, stroma, and surprisingly OSE. Gene ontology analysis of these proteins revealed an enrichment for nucleosome, myosin complex, mitochondria, and other matrix-type protein functions. Interestingly, compared to other post-mitotic tissues where such analyses have been previously performed such as the brain and heart, they find a higher fractional abundance of labeled proteins related to the mitochondria and myosin respectively.

      Strengths:

      A major strength of the study is the combined spatial analyses of LLPs using histological sections with MS analysis to identify retained proteins.

      Another major strength is the use of two chase time points allowing assessment of temporal changes in LLPs associated with aging.

      The major claims such as an enrichment of LLPs in pregranulosa cells, GCs of primary follicles, CL, stroma, and OSE are soundly supported by the analyses and the caveat that nucleic acids might differentially contribute to this signal is well presented.

      The claims that nucleosomes, myosin complex, and mitochondrial proteins are enriched for LLPs are well supported by GO enrichment analysis and well described within the known body of evidence that these proteins are generally long-lived in other tissues.

      Weaknesses:

      All weaknesses were addressed in the revised manuscript.

      Impact of the work:

      This work represents the first study addressing the turnover and retention of long-lived protein in the ovary and will be an invaluable resource for the research community, particularly for those studying ovarian aging. This work also raises important unanswered questions worthy of follow-up including interesting findings regarding the timing of turnover of cell types such as the OSE, organelles such as mitochondria, and ECM proteins such as ZP3 and Tubb family proteins. Most striking are the differences between the two timepoints used (6 and 10 months) which lead the authors to infer trajectories and kinetics of replacement of proteins potentially contributing to ovarian longevity or decline. As such I expect the work will contribute to hypothesis generation and stand to have an important impact on the field.

    4. Reviewer #2 (Public Review):

      Summary:

      The manuscript by Bomba-Warczak et al. applied multi-isotope imaging mass spectrometry (MIMS) analysis to identify the long-lived proteins in mouse ovaries during reproductive aging, and found some proteins related to cytoskeletal and mitochondrial dynamics persisting for 10 months.

      Strengths:

      The manuscript provides a useful dataset about protein turnover during ovarian aging in mice.

      Weaknesses:

      The study is pretty descriptive and short of further new findings based on the dataset. In addition, some results such as the numbers of follicles and ovulated oocytes in aged mice are not consistent with the published literature.

      Comments on revised version:

      The authors did not fully address my previous concerns, especially regarding the verification of the identified proteins, and follow-up functional experiments. In addition, it is still unacceptable for me that the number of ovulated oocytes in mice at 6 months of age is only one third of young mice (10 vs 30; Fig. S1E). The most of published literature show that mice at 12 months of age still have ~10 ovulated oocytes. Moreover, based on the follicle counting method used in the present study (Fig. S1D), there are no antral follicles observed in mice at 6 months and 10 months of age, which is not reasonable.

    5. Reviewer #3 (Public Review):

      Summary:

      In this study Bomba-Warczak et al focused on the reproductive aging, and they presented a map for long-lived proteins which were stable during the reproductive lifespan. The authors used MIMS to examine and show distinct molecules in different cell types in the ovary and tissue regions in 6 months mice, and they also used proteomic analysis to present different LLPs in ovaries between these two timepoints in 6 months and 10 months mice; besides, the authors also examined the LLPs in oocytes in 6 months mice and indicated that these were nuclear, cytoskeleton and mitochondria proteins.

      Strengths:

      Overall, this study provided important information about the pattern of long-lived proteins during aging, which will contribute to the understanding of the defects caused by reproductive aging.

      Weaknesses:

      12 months mice were not examined as the typical aged model.

      Comments on revised version:

      The authors responded to my comments and suggestions. Due to the limitation of the manuscript type, most suggestions of my comments in first round could be considered for future studies by the authors.

    1. eLife assessment

      This potentially valuable study examines the role of IL17-producing Ly6G PMNs as a reservoir for Mycobacterium tuberculosis to evade host killing activated by BCG immunisation. The authors report that IL17-producing polymorphonuclear neutrophils harbour a significant bacterial load in both wild-type and IFNg-/- mice and that targeting IL17 and Cox2 improved disease outcomes whilst enhancing BCG efficacy. Although the authors suggest that targeting these pathways may improve disease outcomes in humans, the evidence as it stands is incomplete and requires additional experimentation for the study to realise its full impact.

    2. Reviewer #1 (Public review):

      Summary:

      Recruitment of neutrophils to the lungs is known to drive susceptibility to infection with M. tuberculosis. In this study, the authors present data in support of the hypothesis that neutrophil production of the cytokine IL-17 underlies the detrimental effect of neutrophils on disease. They claim that neutrophils harbor a large fraction of Mtb during infection, and are a major source of IL-17. To explore the effects of blocking IL-17 signaling during primary infection, they use IL-17 blocking antibodies, SR221 (an inverse agonist of TH17 differentiation), and celecoxib, which they claim blocks Th17 differentiation, and observe modest improvements in bacterial burdens in both WT and IFN-γ deficient mice using the combination of IL-17 blockade with celecoxib during primary infection. Celecoxib enhances control of infection after BCG vaccination.

      Strengths:

      The most novel finding in the paper is that treatment with celecoxib significantly enhances control of infection in BCG-vaccinated mice that have been challenged with Mtb. It was already known that NSAID treatments can improve primary infection with Mtb.

      Weaknesses:

      The major claim of the manuscript - that neutrophils produce IL-17 that is detrimental to the host - is not strongly supported by the data. Data demonstrating neutrophil production of IL-17 lacks rigor. The experiments examining the effects of inhibitors of IL-17 on the outcome of infection are very difficult to interpret. First, treatment with IL-17 inhibitors alone has no impact on bacterial burdens in the lung, either in WT or IFN-γ KO mice. This suggests that IL-17 does not play a detrimental role during infection. Modest effects are observed using the combination of IL-17 blocking drugs and celecoxib, however, the interpretation of these results mechanistically is complicated. Celecoxib is not a specific inhibitor of Th17. Indeed, it affects levels of PGE2, which is known to have numerous impacts on Mtb infection separate from any effect on IL-17 production, as well as other eicosanoids. Finally, the human data simply demonstrates that neutrophils and IL-17 both are higher in patients who experience relapse after treatment for TB, which is expected and does not support their specific hypothesis. The use of genetic ablation of IL-17 production specifically in neutrophils and/or IL-17R in mice would greatly enhance the rigor of this study. The authors do not address the fact that numerous studies have shown that IL-17 has a protective effect in the mouse model of TB in the context of vaccination. Finally, whether and how many times each animal experiment was repeated is unclear.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Sharma et al. demonstrated that Ly6G+ granulocytes (Gra cells) serve as the primary reservoirs for intracellular Mtb in infected wild-type mice and that excessive infiltration of these cells is associated with severe bacteremia in genetically susceptible IFNγ-/- mice. Notably, neutralizing IL-17 or inhibiting COX2 reversed the excessive infiltration of Ly6G+Gra cells, mitigated the associated pathology, and improved survival in these susceptible mice. Additionally, Ly6G+Gra cells were identified as a major source of IL-17 in both wild-type and IFNγ-/- mice. Inhibition of RORγt or COX2 further reduced the intracellular bacterial burden in Ly6G+Gra cells and improved lung pathology.

      Of particular interest, COX2 inhibition in wild-type mice also enhanced the efficacy of the BCG vaccine by targeting the Ly6G+Gra-resident Mtb population.

      Strengths:

      The experimental results showing improved BCG-mediated protective immunity through targeting IL-17-producing Ly6G+ cells and COX2 are compelling and will likely generate significant interest in the field. Overall, this study presents important findings, suggesting that the IL-17-COX2 axis could be a critical target for designing innovative vaccination strategies for TB.

      Weaknesses:

      However, I have the following concerns regarding some of the conclusions drawn from the experiments, which require additional experimental evidence to support and strengthen the overall study.

      Major Concerns:

      (1) Ly6G+ Granulocytes as a Source of IL-17: The authors assert that Ly6G+ granulocytes are the major source of IL-17 in wild-type and IFN-γ KO mice based on colocalization studies of Ly6G and IL-17. In Figure 3D, they report approximately 500 Ly6G+ cells expressing IL-17 in the Mtb-infected WT lung. Are these low numbers sufficient to drive inflammatory pathology? Additionally, have the authors evaluated these numbers in IFN-γ KO mice?

      (2) Role of IL-17-Producing Ly6G Granulocytes in Pathology: The authors suggest that IL-17-producing Ly6G granulocytes drive pathology in WT and IFN-γ KO mice. However, the data presented only demonstrate an association between IL-17+ Ly6G cells and disease pathology. To strengthen their conclusion, the authors should deplete neutrophils in these mice to show that IL-17 expression, and consequently the pathology, is reduced.

      (3) IL-17 Secretion by Mtb-Infected Neutrophils: Do Mtb-infected neutrophils secrete IL-17 into the supernatants? This would serve as confirmation of neutrophil-derived IL-17. Additionally, are Ly6G+ cells producing IL-17 and serving as pathogenic agents exclusively in vivo? The authors should provide comments on this.

      (4) Characterization of IL-17-Producing Ly6G+ Granulocytes: Are the IL-17-producing Ly6G+ granulocytes a mixed population of neutrophils and eosinophils, or are they exclusively neutrophils? Sorting these cells followed by Giemsa or eosin staining could clarify this.

    4. Reviewer #3 (Public review):

      Summary:

      The authors examine how distinct cellular environments differentially control Mtb following BCG vaccination. The key findings are that IL17-producing PMNs harbor a significant Mtb load in both wild-type and IFNg-/- mice. Targeting IL17 and Cox2 improved disease and enhanced BCG efficacy over 12 weeks and neutrophils/IL17 are associated with treatment failure in humans. The authors suggest that targeting these pathways, especially in MSMD patients may improve disease outcomes.

      Strengths:

      The experimental approach is generally sound and consists of low-dose aerosol infections with distinct readouts including cell sorting followed by CFU, histopathology, and RNA sequencing analysis. By combining genetic approaches and chemical/antibody treatments, the authors can probe these pathways effectively.

      Understanding how distinct inflammatory pathways contribute to control or worsen Mtb disease is important and thus, the results will be of great interest to the Mtb field.

      Weaknesses:

      A major limitation of the current study is overlooking the role of non-hematopoietic cells in the IFNg/IL17/neutrophil response. Chimera studies from Ernst and colleagues (PMCID: PMC2807991) previously described this IDO-dependent pathway following the loss of IFNg through an increased IL17 response. This study is not cited nor discussed even though it may alter the interpretation of several experiments.

      Several of the key findings in mice have previously been shown (albeit with less sophisticated experimentation) and human disease and neutrophils are well described - thus the real new finding is how intracellular Mtb in neutrophils are more refractory to BCG-mediated control. However, given there are already high levels of Mtb in PMNs compared to other cell types, and there is a decrease in intracellular Mtb in PMNs following BCG immunization the strength of this finding is a bit limited.

    5. Author response:

      eLife assessment

      This potentially valuable study examines the role of IL17-producing Ly6G PMNs as a reservoir for Mycobacterium tuberculosis to evade host killing activated by BCG immunisation. The authors report that IL17-producing polymorphonuclear neutrophils harbour a significant bacterial load in both wild-type and IFNg-/- mice and that targeting IL17 and Cox2 improved disease outcomes whilst enhancing BCG efficacy. Although the authors suggest that targeting these pathways may improve disease outcomes in humans, the evidence as it stands is incomplete and requires additional experimentation for the study to realise its full impact.

      Thank you for evaluating our manuscript. We understand the concern related to the direct role of Ly6G+Gra-derived IL17 in TB pathogenesis. For the revised manuscript, we will provide additional experimental evidence through direct regulation of IL-17 production in Mtb-infected mice and its impact on improving BCG efficacy.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Recruitment of neutrophils to the lungs is known to drive susceptibility to infection with M. tuberculosis. In this study, the authors present data in support of the hypothesis that neutrophil production of the cytokine IL-17 underlies the detrimental effect of neutrophils on disease. They claim that neutrophils harbor a large fraction of Mtb during infection, and are a major source of IL-17. To explore the effects of blocking IL-17 signaling during primary infection, they use IL-17 blocking antibodies, SR221 (an inverse agonist of TH17 differentiation), and celecoxib, which they claim blocks Th17 differentiation, and observe modest improvements in bacterial burdens in both WT and IFN-γ deficient mice using the combination of IL-17 blockade with celecoxib during primary infection. Celecoxib enhances control of infection after BCG vaccination. 

      Thank you for the summary.

      Strengths:

      The most novel finding in the paper is that treatment with celecoxib significantly enhances control of infection in BCG-vaccinated mice that have been challenged with Mtb. It was already known that NSAID treatments can improve primary infection with Mtb.

      Thank you.

      Weaknesses:

      The major claim of the manuscript - that neutrophils produce IL-17 that is detrimental to the host - is not strongly supported by the data. Data demonstrating neutrophil production of IL17 lacks rigor. 

      Our response: Neutrophil production of IL-17 is supported by two independent methods/ techniques in the current version: 

      (1) Through Flow cytometry- a large fraction of Ly6G+CD11b+ cells from the lungs of Mtb-infected mice were also positive for IL-17 (Fig. 3C).

      (2) IFA co-staining of Ly6G + cells with IL-17 in the lung sections from Mtb-infected mice (Fig. 3 E_G and Fig. 4H, Fig. 5I).

      However, to further strengthen this observation, we plan to analyse sorted Ly6G+Gra from the lungs of infected mice using IL-17 ELISPOT assay. This will unequivocally prove the Ly6+Gra production of IL-17. Several publications support the production of IL-17 by neutrophils (Li et al. 2010; Katayama et al. 2013; Lin et al. 2011). For example, neutrophils have been identified as a source of IL-17 in human psoriatic lesions (Lin et al. 2011), in neuroinflammation induced by traumatic brain injury (Xu et al. 2023) and in several mouse models of infectious and autoimmune inflammation (Ferretti et al. 2003; Hoshino et al. 2008) (Li et al. 2010). However, ours is the first study reporting neutrophil IL-17 production during Mtb pathology.

      The experiments examining the effects of inhibitors of IL-17 on the outcome of infection are very difficult to interpret. First, treatment with IL-17 inhibitors alone has no impact on bacterial burdens in the lung, either in WT or IFN-γ KO mice. This suggests that IL-17 does not play a detrimental role during infection. Modest effects are observed using the combination of IL-17 blocking drugs and celecoxib, however, the interpretation of these results mechanistically is complicated. Celecoxib is not a specific inhibitor of Th17. Indeed, it affects levels of PGE2, which is known to have numerous impacts on Mtb infection separate from any effect on IL-17 production, as well as other eicosanoids. 

      The reviewer correctly says that Celecoxib is not a specific inhibitor of Th17. However, COX-2 inhibition does have an effect on IL-17 levels, and numerous reports support this observation (Paulissen et al. 2013; Napolitani et al. 2009; Lemos et al. 2009). We elaborate on the results below for better clarity.

      Firstly, in the WT mice, Celecoxib treatment led to a complete loss of IL-17 production in the lungs of Mtb-infected mice (Fig. 5D). Interestingly, IL-17 production independent of IL-23 is known to require PGE2 (Paulissen et al. 2013; Polese et al. 2021). In the WT or IFNγ KO mice, we rather noted a decline in IL-23 levels post-infection, suggesting a possible role of PGE2 in IL-17 production. However, in the lung homogenates of Mtb-infected IFNγ KO mice, Celecoxib had no effect on IL-17 levels in the lung homogenates. Thus, celecoxib controls IL-17 levels only in the Mtb-infected WT mice. Including celecoxib with anti-IL17 in the IFNγ KO mice controls pathology and extends its survival.

      Second, the reviewer’s observation is only partially correct that IL-17 inhibition has a modest effect on the outcome of infection. While IL-17 neutralization and inhibition alone in the IFNγ KO mice and WT mice, respectively, did not bring down the lung CFU burden significantly, in both these cases, there was an improvement in the lung pathology. The reduced pathology coincided with reduced neutrophil recruitment and a reduced Ly6G+Graresident Mtb population in the WT mice. IL-17 neutralization alone improved IFNγ KO mice survival by ~10 days (Fig. 4F-G). 

      Third, regarding the SR2211 and Celecoxib combination study, we agree with the reviewer that Celecoxib has roles independent of IL-17 regulation. However, in the results presented in this study, there are three key aspects- 1) neutrophil-derived IL-17-dependent neutrophil recruitment, 2) the presence of a large proportion of intracellular Mtb in the neutrophils and 3) dissemination of Mtb to the spleen. Celecoxib treatment alone helps reduce lung Mtb burden in the WT mice. However, SR2211 fails to do so. It is evident that celecoxib is doing more than just inhibiting IL-17 production. The result shows that celecoxib blocks neutrophil recruitment (which could be an IL-17-dependent mechanism) and also controls the intraneutrophil bacterial population. Finally, either SR2211 or celecoxib could block dissemination to the spleen. The role of neutrophils in TB dissemination is only beginning to emerge (Hult et al. 2021). We will revise the description in the results and discussion section for this data to make it easier to understand.

      Finally, we have also done experiments with SR2211 in BCG-vaccinated animals, which shows the direct impact of IL-17 inhibition on the BCG vaccine efficacy. We will add this result in the revised version.

      Finally, the human data simply demonstrates that neutrophils and IL-17 both are higher in patients who experience relapse after treatment for TB, which is expected and does not support their specific hypothesis. 

      We disagree with the above statement. Why a higher IL-17 is expected in patients who show relapse, death or failed treatment outcomes? Classically, IL-17 is believed to be protective against TB, and the reviewer also points to that in the comments below. A very limited set of studies support the non-protective/pathological role of IL-17 in tuberculosis (Cruz et al. 2010). High IL-17 and neutrophilia at the baseline in the human subjects (i.e. at the time of recruitment in the study) highlight severe pathology in those subjects, which could have contributed to the failed treatment outcome. This observation in the human cohort strongly supports the overall theme and central observation in this study.

      The use of genetic ablation of IL-17 production specifically in neutrophils and/or IL-17R in mice would greatly enhance the rigor of this study. 

      The reviewer’s point is well-taken. Having a genetic ablation of IL-17 production, specifically in the neutrophils, would be excellent. At present, however, we lack this resource, and therefore, it is not feasible to do this experiment within a defined timeline. Instead, for the revised manuscript, we will present the data with SR2211, a direct inhibitor of RORgt and, therefore, IL-17, in BCG-vaccinated mice.

      The authors do not address the fact that numerous studies have shown that IL-17 has a protective effect in the mouse model of TB in the context of vaccination.

      Yes, there are a few articles that talk about the protective effect of IL-17 in the mouse model of TB in the context of vaccination (Khader et al. 2007; Desel et al. 2011; Choi et al. 2020). This part was discussed in the original manuscript (in the Introduction section). For the revised manuscript, we will also provide results from the experiment where we blocked IL-17 production by inhibiting RORgt using SR2211 in BCG-vaccinated mice. The results clearly show IL-17 as a negative regulator of BCG-mediated protective immunity. We believe some of the reasons for the observed differences could be 1) in our study, we analysed IL-17 levels in the lung homogenates at late phases of infection, and 2) most published studies rely on ex vivo stimulation of immune cells to measure cytokine production, whereas we actually measured the cytokine levels in the lung homogenates. We will elaborate on these points in the revised version.

      Finally, whether and how many times each animal experiment was repeated is unclear.

      We will provide the details of the number of experiments in the revised version. Briefly, the BCG vaccination experiment (Figure 1) and BCG vaccination with Celecoxib treatment experiment (Figure 6) were performed twice and thrice, respectively. The IL-17 neutralization experiment (Figure 4) and the SR2211 treatment experiment (Figure 5) were done once. We will add another SR2211 experiment data in the revised version. 

      Reviewer #2 (Public review):

      Summary:

      In this study, Sharma et al. demonstrated that Ly6G+ granulocytes (Gra cells) serve as the primary reservoirs for intracellular Mtb in infected wild-type mice and that excessive infiltration of these cells is associated with severe bacteremia in genetically susceptible IFNγ/- mice. Notably, neutralizing IL-17 or inhibiting COX2 reversed the excessive infiltration of Ly6G+Gra cells, mitigated the associated pathology, and improved survival in these susceptible mice. Additionally, Ly6G+Gra cells were identified as a major source of IL-17 in both wild-type and IFNγ-/- mice. Inhibition of RORγt or COX2 further reduced the intracellular bacterial burden in Ly6G+Gra cells and improved lung pathology.

      Of particular interest, COX2 inhibition in wild-type mice also enhanced the efficacy of the BCG vaccine by targeting the Ly6G+Gra-resident Mtb population.

      Thank you for the summary.

      Strengths:

      The experimental results showing improved BCG-mediated protective immunity through targeting IL-17-producing Ly6G+ cells and COX2 are compelling and will likely generate significant interest in the field. Overall, this study presents important findings, suggesting that the IL-17-COX2 axis could be a critical target for designing innovative vaccination strategies for TB.

      Thank you for highlighting the overall strengths of the study.  Weaknesses:

      However, I have the following concerns regarding some of the conclusions drawn from the experiments, which require additional experimental evidence to support and strengthen the overall study.

      Major Concerns:

      (1) Ly6G+ Granulocytes as a Source of IL-17: The authors assert that Ly6G+ granulocytes are the major source of IL-17 in wild-type and IFN-γ KO mice based on colocalization studies of Ly6G and IL-17. In Figure 3D, they report approximately 500 Ly6G+ cells expressing IL-17 in the Mtb-infected WT lung. Are these low numbers sufficient to drive inflammatory pathology? Additionally, have the authors evaluated these numbers in IFN-γ KO mice? 

      Thank you for pointing out about the numbers in Fig. 3D. It was our oversight to label the axis as No. of IL17+Ly6G+Gra/lung. For this data, only a part of the lung was used. For the revised manuscript, we will provide the number of these cells at the whole lung level from Mtb-infected WT mice. Unfortunately, we did not evaluate these numbers in IFN-γ KO mice through FACS. 

      For the assertion that Ly6G+Gra are the major source of IL-17 in TB, we have used two separate strategies- a) IFA and b) FACS. 

      However, as described above in response to the first reviewer, for the revision, we propose to perform an IL-17 ELISpot assay on the sorted Ly6G+Gra from the lungs of Mtb-infected WT mice.

      (2) Role of IL-17-Producing Ly6G Granulocytes in Pathology: The authors suggest that IL17-producing Ly6G granulocytes drive pathology in WT and IFN-γ KO mice. However, the data presented only demonstrate an association between IL-17+ Ly6G cells and disease pathology. To strengthen their conclusion, the authors should deplete neutrophils in these mice to show that IL-17 expression, and consequently the pathology, is reduced.

      Thank you for this suggestion. Others have done neutrophil depletion studies in TB, and so far, the outcomes remain inconclusive. In some studies, neutrophil depletion helps the pathogen (Rankin et al. 2022; Pedrosa et al. 2000; Appelberg et al. 1995), and in others, it helps the host (Lovewell et al. 2021; Mishra et al. 2017) ). One reason for this variability is the stage of infection when neutrophil depletion was done. However, another crucial factor is the heterogeneity in the neutrophil population. There are reports that suggest neutrophil subtypes with protective versus pathological trajectories (Nwongbouwoh Muefong et al. 2022; Lyadova 2017; Hellebrekers, Vrisekoop, and Koenderman 2018; Leliefeld et al. 2018). Depleting the entire population using anti-Ly6G could impact this heterogeneity and may impact the inferences drawn. A better approach would be to characterise this heterogeneous population, efforts towards which could be part of a separate study.

      For the revised manuscript, we will provide results from the SR2211 experiment in BCG-vaccinated mice and other results to show the role of IL-17-producing Ly6G+Gra in TB pathology.   

      (3) IL-17 Secretion by Mtb-Infected Neutrophils: Do Mtb-infected neutrophils secrete IL-17 into the supernatants? This would serve as confirmation of neutrophil-derived IL-17. Additionally, are Ly6G+ cells producing IL-17 and serving as pathogenic agents exclusively in vivo? The authors should provide comments on this.

      We have not directly measured IL-17 secretion by neutrophils in our experiments. However, Hu et al have reported IL-17 secretion by Mtb-infected neutrophils in vitro (Hu et al. 2017). Whether there are a few neutrophil roles exclusively seen under in vivo condition is an interesting proposition. We do have some observations that suggest in vitro phenotype of Mtb-infected neutrophils is different from in vivo.

      (4) Characterization of IL-17-Producing Ly6G+ Granulocytes: Are the IL-17-producing Ly6G+ granulocytes a mixed population of neutrophils and eosinophils, or are they exclusively neutrophils? Sorting these cells followed by Giemsa or eosin staining could clarify this.

      This is a very important point. While usually eosinophils do not express Ly6G markers in laboratory mice, under specific contexts, including infections, eosinophils can express Ly6G. Since we have not characterized these potential Ly6G+ sub-populations, that is one of the reasons we refer to the cell types as Ly6G+ granulocytes, which do not exclude Ly6G+ eosinophils. A detailed characterization of these subsets could be taken up as a separate study.

      Reviewer #3 (Public review):

      Summary:

      The authors examine how distinct cellular environments differentially control Mtb following BCG vaccination. The key findings are that IL17-producing PMNs harbor a significant Mtb load in both wild-type and IFNg-/- mice. Targeting IL17 and Cox2 improved disease and enhanced BCG efficacy over 12 weeks and neutrophils/IL17 are associated with treatment failure in humans. The authors suggest that targeting these pathways, especially in MSMD patients may improve disease outcomes.

      Thank you.

      Strengths:

      The experimental approach is generally sound and consists of low-dose aerosol infections with distinct readouts including cell sorting followed by CFU, histopathology, and RNA sequencing analysis. By combining genetic approaches and chemical/antibody treatments, the authors can probe these pathways effectively.

      Understanding how distinct inflammatory pathways contribute to control or worsen Mtb disease is important and thus, the results will be of great interest to the Mtb field.

      Thank you.

      Weaknesses:

      A major limitation of the current study is overlooking the role of non-hematopoietic cells in the IFNg/IL17/neutrophil response. Chimera studies from Ernst and colleagues (PMCID: PMC2807991) previously described this IDO-dependent pathway following the loss of IFNg through an increased IL17 response. This study is not cited nor discussed even though it may alter the interpretation of several experiments.

      Thank you for pointing out this earlier study, which we concede we missed discussing. We disagree on the point that results from that study may alter the interpretation of several experiments in our study. On the contrary, the main observation that loss of IFNγ causes severe IL-17 levels is aligned in both studies.

      IDO1 is known to alter Th cell differentiation towards Tregs and away from Th17 (Baban et al. 2009). It is absolutely feasible for the non-hematopoietic cells to regulate these events. However, that does not rule out the neutrophil production of IL-17 and the downstream pathological effect shown in this study. We will discuss and cite this study in the revised manuscript.

      Several of the key findings in mice have previously been shown (albeit with less sophisticated experimentation) and human disease and neutrophils are well described - thus the real new finding is how intracellular Mtb in neutrophils are more refractory to BCGmediated control. However, given there are already high levels of Mtb in PMNs compared to other cell types, and there is a decrease in intracellular Mtb in PMNs following BCG immunization the strength of this finding is a bit limited.

      The reviewer’s interpretation of the BCG-refractory Mtb population in the neutrophil is interesting. The reviewer is right that neutrophils had a higher intracellular Mtb burden, which decreased in the BCG-vaccinated animals. Thus, on that account, the reviewer rightly mentions that BCG is able to control Mtb even in neutrophils. However, BCG almost clears intracellular burden from other cell types analysed, and therefore, the remnant pool of intracellular Mtb in the lungs of BCG-vaccinated animals could be mostly those present in the neutrophils. This is a substantial novel development in the field and attracts focus towards innate immune cells for vaccine efficacy. 

      References:

      Appelberg, R., A. G. Castro, S. Gomes, J. Pedrosa, and M. T. Silva. 1995. 'SuscepBbility of beige mice to Mycobacterium avium: role of neutrophils', Infect Immun, 63: 3381-7.

      Baban, B., P. R. Chandler, M. D. Sharma, J. Pihkala, P. A. Koni, D. H. Munn, and A. L. Mellor. 2009. 'IDO activates regulatory T cells and blocks their conversion into Th17-like T cells', J Immunol, 183: 2475-83.

      Choi, H. G., K. W. Kwon, S. Choi, Y. W. Back, H. S. Park, S. M. Kang, E. Choi, S. J. Shin, and H. J. Kim. 2020. 'AnBgen-Specific IFN-gamma/IL-17-Co-Producing CD4(+) T-Cells Are the Determinants for ProtecBve Efficacy of Tuberculosis Subunit Vaccine', Vaccines (Basel), 8.

      Cruz, A., A. G. Fraga, J. J. Fountain, J. Rangel-Moreno, E. Torrado, M. Saraiva, D. R. Pereira, T. D. Randall, J. Pedrosa, A. M. Cooper, and A. G. Castro. 2010. 'Pathological role of interleukin 17 in mice subjected to repeated BCG vaccination after infection with Mycobacterium tuberculosis', J Exp Med, 207: 1609-16.

      Desel, C., A. Dorhoi, S. Bandermann, L. Grode, B. Eisele, and S. H. Kaufmann. 2011. 'Recombinant BCG DeltaureC hly+ induces superior protection over parental BCG by simulating a balanced combination of type 1 and type 17 cytokine responses', J Infect Dis, 204: 1573-84.

      Ferreg, S., O. Bonneau, G. R. Dubois, C. E. Jones, and A. Trifilieff. 2003. 'IL-17, produced by lymphocytes and neutrophils, is necessary for lipopolysaccharide-induced airway neutrophilia: IL-15 as a possible trigger', J Immunol, 170: 2106-12.

      Hellebrekers, P., N. Vrisekoop, and L. Koenderman. 2018. 'Neutrophil phenotypes in health and disease', Eur J Clin Invest, 48 Suppl 2: e12943.

      Hoshino, A., T. Nagao, N. Nagi-Miura, N. Ohno, M. Yasuhara, K. Yamamoto, T. Nakayama, and K. Suzuki. 2008. 'MPO-ANCA induces IL-17 production by activated neutrophils in vitro via classical complement pathway-dependent manner', J Autoimmun, 31: 79-89.

      Hu, S., W. He, X. Du, J. Yang, Q. Wen, X. P. Zhong, and L. Ma. 2017. 'IL-17 ProducBon of Neutrophils Enhances AnBbacteria Ability but Promotes ArthriBs Development During Mycobacterium tuberculosis InfecBon', EBioMedicine, 23: 88-99.

      Hult, C., J. T. Magla, H. P. Gideon, J. J. Linderman, and D. E. Kirschner. 2021. 'Neutrophil Dynamics Affect Mycobacterium tuberculosis Granuloma Outcomes and DisseminaBon', Front Immunol, 12: 712457.

      Katayama, M., K. Ohmura, N. Yukawa, C. Terao, M. Hashimoto, H. Yoshifuji, D. Kawabata, T. Fujii, Y. Iwakura, and T. Mimori. 2013. 'Neutrophils are essential as a source of IL-17 in the effector phase of arthritis', PLoS One, 8: e62231.

      Khader, S. A., G. K. Bell, J. E. Pearl, J. J. Fountain, J. Rangel-Moreno, G. E. Cilley, F. Shen, S. M. Eaton, S. L. Gaffen, S. L. Swain, R. M. Locksley, L. Haynes, T. D. Randall, and A. M. Cooper. 2007. 'IL-23 and IL-17 in the establishment of protective pulmonary CD4+ T cell responses after vaccination and during Mycobacterium tuberculosis challenge', Nat Immunol, 8: 369-77.

      Leliefeld, P. H. C., J. Pillay, N. Vrisekoop, M. Heeres, T. Tak, M. Kox, S. H. M. Rooijakkers, T. W. Kuijpers, P. Pickkers, L. P. H. Leenen, and L. Koenderman. 2018. 'DifferenBal antibacterial control by neutrophil subsets', Blood Adv, 2: 1344-55.

      Lemos, H. P., R. Grespan, S. M. Vieira, T. M. Cunha, W. A. Verri, Jr., K. S. Fernandes, F. O. Souto, I. B. McInnes, S. H. Ferreira, F. Y. Liew, and F. Q. Cunha. 2009. 'Prostaglandin mediates IL-23/IL-17induced neutrophil migraBon in inflammation by inhibiting IL-12 and IFNgamma production', Proc Natl Acad Sci U S A, 106: 5954-9.

      Li, L., L. Huang, A. L. Vergis, H. Ye, A. Bajwa, V. Narayan, R. M. Strieter, D. L. Rosin, and M. D. Okusa. 2010. 'IL-17 produced by neutrophils regulates IFN-gamma-mediated neutrophil migration in mouse kidney ischemia-reperfusion injury', J Clin Invest, 120: 331-42.

      Lin, A. M., C. J. Rubin, R. Khandpur, J. Y. Wang, M. Riblen, S. Yalavarthi, E. C. Villanueva, P. Shah, M. J. Kaplan, and A. T. Bruce. 2011. 'Mast cells and neutrophils release IL-17 through extracellular trap formation in psoriasis', J Immunol, 187: 490-500.

      Lovewell, R. R., C. E. Baer, B. B. Mishra, C. M. Smith, and C. M. Sasseg. 2021. 'Granulocytes act as a niche for Mycobacterium tuberculosis growth', Mucosal Immunol, 14: 229-41.

      Lyadova, I. V. 2017. 'Neutrophils in Tuberculosis: Heterogeneity Shapes the Way?', Mediators Inflamm, 2017: 8619307.

      Mishra, B. B., R. R. Lovewell, A. J. Olive, G. Zhang, W. Wang, E. Eugenin, C. M. Smith, J. Y. Phuah, J. E. Long, M. L. Dubuke, S. G. Palace, J. D. Goguen, R. E. Baker, S. Nambi, R. Mishra, M. G. Booty, C. E. Baer, S. A. Shaffer, V. Dartois, B. A. McCormick, X. Chen, and C. M. Sasseg. 2017. 'Nitric oxide prevents a pathogen-permissive granulocytic inflammation during tuberculosis', Nat Microbiol, 2: 17072.

      Napolitani, G., E. V. Acosta-Rodriguez, A. Lanzavecchia, and F. Sallusto. 2009. 'Prostaglandin E2 enhances Th17 responses via modulation of IL-17 and IFN-gamma production by memory CD4+ T cells', Eur J Immunol, 39: 1301-12.

      Nwongbouwoh Muefong, C., O. Owolabi, S. Donkor, S. Charalambous, A. Bakuli, A. Rachow, C. Geldmacher, and J. S. Sutherland. 2022. 'Neutrophils Contribute to Severity of Tuberculosis Pathology and Recovery From Lung Damage Pre- and Posnreatment', Clin Infect Dis, 74: 1757-66.

      Paulissen, S. M., J. P. van Hamburg, N. Davelaar, P. S. Asmawidjaja, J. M. Hazes, and E. Lubberts. 2013. 'Synovial fibroblasts directly induce Th17 pathogenicity via the cyclooxygenase/prostaglandin E2 pathway, independent of IL-23', J Immunol, 191: 1364-72.

      Pedrosa, J., B. M. Saunders, R. Appelberg, I. M. Orme, M. T. Silva, and A. M. Cooper. 2000. 'Neutrophils play a protective nonphagocytic role in systemic Mycobacterium tuberculosis infection of mice', Infect Immun, 68: 577-83.

      Polese, B., B. Thurairajah, H. Zhang, C. L. Soo, C. A. McMahon, G. Fontes, S. N. A. Hussain, V. Abadie, and I. L. King. 2021. 'Prostaglandin E(2) amplifies IL-17 production by gamma-delta T cells during barrier inflammation', Cell Rep, 36: 109456.

      Rankin, A. N., S. V. Hendrix, S. K. Naik, and C. L. Stallings. 2022. 'Exploring the Role of Low-Density Neutrophils During Mycobacterium tuberculosis InfecBon', Front Cell Infect Microbiol, 12: 901590.

      Xu, X. J., Q. Q. Ge, M. S. Yang, Y. Zhuang, B. Zhang, J. Q. Dong, F. Niu, H. Li, and B. Y. Liu. 2023. 'Neutrophil-derived interleukin-17A participates in neuroinflammation induced by traumatic brain injury', Neural Regen Res, 18: 1046-51.

    1. eLife assessment

      This useful study investigates the role of Complement 3a Receptor 1 (C3aR) in the pathogenesis of Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD) using mouse models with specific target deletions in various cell types. While the relevance of C3aR in inflammatory contexts has been established, the authors provide helpful but incomplete evidence that C3aR does not contribute significantly to MASLD pathogenesis in their models, a claim that would require additional experiments for support.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper Homan et al used mouse models of Metabolic Dysfunction-Associated Steatotic Liver Disease and different specific target deletions in cells to rule out the role of Complement 3a Receptor 1 in the pathogenesis of disease. They provided limited evidence and only descriptive results that despite C3aR being relevant in different contexts of inflammation, however, these tenets did not hold true.

      Weaknesses:

      (1) The results are based on readouts showing that C3aR is not involved in the pathogenesis of liver metabolic disease.

      (2) The description of the mouse models they used to validate their findings is not clear. Lysm-cre mice - which are claimed to delete C3aR in (?) macrophages are not specific for these cells, and the genetic strategy to delete C3aR in Kupffer cells is not clear.

      (3) Taking this into account, it is very challenging to determine the validity of these data, also considering that they are merely descriptive and correlative.

    3. Reviewer #2 (Public review):

      Summary:

      Homan et al. examined the effect of macrophage- or Kupffer cell-specific C3aR1 KO on MASLD/MASH-related metabolic or liver phenotypes.

      Strengths:

      Established macrophage- or Kupffer cell-specific C3aR1 KO mice.

      Weaknesses:

      Lack of in-depth study; flaws in comparisons between KC-specific C3aR1KO and WT in the context of MASLD/MASH, because MASLD/MASH WT mice likely have a low abundance of C3aR1 on KCs.

      Homan et al. reported a set of observation data from macrophage or Kupffer cell-specific C3aR1KO mice. Several questions and concerns as follows could challenge the conclusions of this study:

      (1) As C3aR1 is robustly repressed in MASLD or MASH liver, GAN feeding likely reduced C3aR1 abundance in the liver of WT mice. Thus, it is not surprising that there were no significant differences in liver phenotypes between WT vs. C3aR1KO mice after prolonged GAN diet feeding. It would give more significance to the study if restoring C3aR1 abundance in KCs in the context of MASLD/MASH.

      (2) Would C3aR1KO mice develop liver abnormalities after a short period of GAN diet feeding?

      (3) What would be the liver macrophage phenotypes in WT vs C3aR1KO mice after GAN feeding?

      (4) In Fig 1D, >25wks GAN feeding had minimal effects on female body weight gain. These GAN-fed female mice also develop NASLD/MASH liver abnormalities?

      (5) Would C3aR1KO result in differences in liver phenotypes, including macrophage population/activation, liver inflammation, lipogenesis, in lean mice?

      (6) The authors should provide more information regarding the generation of KC-specific C3aR1KO. Which Cre mice were used to breed with C3aR1 flox mice?

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      In this paper Homan et al used mouse models of Metabolic Dysfunction-Associated Steatotic Liver Disease and different specific target deletions in cells to rule out the role of Complement 3a Receptor 1 in the pathogenesis of disease. They provided limited evidence and only descriptive results that despite C3aR being relevant in different contexts of inflammation, however, these tenets did not hold true. 

      Weaknesses: 

      (1) The results are based on readouts showing that C3aR is not involved in the pathogenesis of liver metabolic disease. 

      (2) The description of the mouse models they used to validate their findings is not clear. Lysm-cre mice - which are claimed to delete C3aR in (?) macrophages are not specific for these cells, and the genetic strategy to delete C3aR in Kupffer cells is not clear. 

      (3) Taking this into account, it is very challenging to determine the validity of these data, also considering that they are merely descriptive and correlative. 

      We generated 2 different cohorts of mice using LysM-Cre (Jackson Strain #004781) to drive deletion in all macrophages and Clec4f-Cre (Jackson Strain #033296) to specifically ablate C3ar1 in Kupffer cells. We will ensure that experimental models will be clearly defined in the revised manuscript. The reviewer’s point is well taken that LysM-Cre transgene can also be active in granulocytes and some dendritic cells. Even so, despite deletion of C3ar1 in macrophages and other granulocytes, we do not see a major effect on hepatic steatosis and fibrosis in this GAN diet induced model of MASLD/MASH. This was a somewhat surprising finding. We do not agree that our findings are correlative. We specifically ablated C3aR1 in macrophages or Kupffer cells and found no significant differences in the major readouts of steatosis and fibrosis for MASLD/MASH between control and knockout mice. It is possible that in other models of liver injury that we did not test (e.g., short-term treatment with a hepatotoxin such as carbon tetrachloride), there may be differences in liver injury in mice lacking C3ar1 in macrophages, but the GAN diet model has been shown to better parallel the gene expression changes in human MAFLD/MASH.

      Reviewer #2 (Public review):

      Summary:

      Homan et al. examined the effect of macrophage- or Kupffer cell-specific C3aR1 KO on MASLD/MASHrelated metabolic or liver phenotypes. 

      Strengths:

      Established macrophage- or Kupffer cell-specific C3aR1 KO mice. 

      Weaknesses:

      Lack of in-depth study; flaws in comparisons between KC-specific C3aR1KO and WT in the context of MASLD/MASH, because MASLD/MASH WT mice likely have a low abundance of C3aR1 on KCs. 

      Homan et al. reported a set of observation data from macrophage or Kupffer cell-specific C3aR1KO mice. Several questions and concerns as follows could challenge the conclusions of this study: 

      (1) As C3aR1 is robustly repressed in MASLD or MASH liver, GAN feeding likely reduced C3aR1 abundance in the liver of WT mice. Thus, it is not surprising that there were no significant differences in liver phenotypes between WT vs. C3aR1KO mice after prolonged GAN diet feeding. It would give more significance to the study if restoring C3aR1 abundance in KCs in the context of MASLD/MASH. 

      GAN diet feeding resulted in higher liver C3ar1 compared to regular diet (Figure 1H). This thus became an impetus for studying the effects of C3ar1 deletion in macrophages or Kupffer cells, which are responsible for the majority of liver C3ar1 expression, in MASLD/MASH (Figures 2B and 3H).  

      (2) Would C3aR1KO mice develop liver abnormalities after a short period of GAN diet feeding?  

      We did not assess if short term GAN diet feeding resulted in significant differences in liver abnormalities in the C3ar1 macrophage or Kupffer cell knockout mice. Perhaps the reviewer’s point is that perhaps with shorter periods of GAN diet feeding there may be a phenotype in the KO mice. We agree that this is entirely possible, though with shorter feeding timeframes what is typically seen is hepatic steatosis without fibrosis. Nevertheless, the most important element in our opinion for a disease preventing or modifying model lies with the longer-term GAN diet feeding. With long term GAN diet feeding that has been previously shown to model human MASLD/MASH, we did not observe significant differences in liver abnormalities with the KO mice.

      (3) What would be the liver macrophage phenotypes in WT vs C3aR1KO mice after GAN feeding? 

      Similar to the above point, given the lack of a major MASLD/MASH phenotype in hepatic steatosis and fibrosis, we did not further profile the liver macrophage profiles of the macrophage or Kupffer cell C3ar1 KO mice with GAN feeding.  

      (4) In Fig 1D, >25wks GAN feeding had minimal effects on female body weight gain. These GAN-fed female mice also develop NASLD/MASH liver abnormalities? 

      We thank the reviewer for this question. In general, female GAN-fed mice develop milder MASLD/MASH abnormalities. We will include additional data in the revised manuscript.

      (5) Would C3aR1KO result in differences in liver phenotypes, including macrophage population/activation, liver inflammation, lipogenesis, in lean mice? 

      Likewise, we will include data further characterizing liver inflammation, lipogenesis and macrophages in macrophage C3ar1 KO mice under lean/regular diet conditions.

      (6) The authors should provide more information regarding the generation of KC-specific C3aR1KO. Which Cre mice were used to breed with C3aR1 flox mice? 

      Clec4f-Cre transgenic mice were used to generate Kupffer cell specific KO of C3ar1. This will be clarified and explicitly stated in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study puts forth the model that under IFN-B stimulation, liquid-phase WTAP coordinates with the transcription factor STAT1 to recruit MTC to the promoter region of interferon-stimulated genes (ISGs), mediating the installation of m6A on newly synthesized ISG mRNAs. This model is supported by strong evidence that the phosphorylation state of WTAP, regulated by PPP4, is regulated by IFN-B stimulation, and that this results in interactions between WTAP, the m6A methyltransferase complex, and STAT1, a transcription factor that mediates activation of ISGs. This was demonstrated via a combination of microscopy, immunoprecipitations, m6A sequencing, and ChIP. These experiments converge on a set of experiments that nicely demonstrate that IFN-B stimulation increases the interaction between WTAP, METTL3, and STAT1, that this interaction is lost with the knockdown of WTAP (even in the presence of IFN-B), and that this IFN-B stimulation also induces METTL3-ISG interactions.

      Strengths:

      The evidence for the IFN-B stimulated interaction between METTL3 and STAT1, mediated by WTAP, is quite strong. Removal of WTAP in this system seems to be sufficient to reduce these interactions and the concomitant m6A methylation of ISGs. The conclusion that the phosphorylation state of WTAP is important in this process is also quite well supported.

      Weaknesses:

      The evidence that the above mechanism is fundamentally driven by different phase-separated pools of WTAP (regulated by its phosphorylation state) is weaker. These experiments rely relatively heavily on the treatment of cells with 1,6-hexanediol, which has been shown to have some off-target effects on phosphatases and kinases (PMID 33814344).

      Given that the model invoked in this study depends on the phosphorylation (or lack thereof) of WTAP, this is a particularly relevant concern.

      Related to this point, it is also interesting (and potentially concerning for the proposed model) that the initial region of WTAP that was predicted to be disordered is in fact not the region that the authors demonstrate is important for the different phase-separated states. Taking all the data together, it is also not clear to me that one has to invoke phase separation in the proposed mechanism.

      We are grateful for the Reviewer’s positive comment and constructive feedback. In this article, we claim a novel and important mechanism that de-phosphorylation-driven solid to liquid phase transition of WTAP mediates its co-transcriptional m6A modification. We first observed that WTAP underwent phase transition during virus infection and IFN-β stimulation, and confirmed the phase transition driven force of WTAP through multiple experiments. Besides 1,6‐hexanediol (1,6-hex) treatment, we also introduced S/T to D/A mutations to mimic the phosphorylation and de-phosphorylation WTAP in vitro and in cells, identified 5ST-D mutant as SLPS mutant, and 5ST-A mutant as LLPS mutant. We then performed 1,6-hex experiment to confirm the importance of phase separation for WTAP function, and revealed that 5ST-D SLPS mutant and 5ST-A LLPS mutant had different influence on WTAP-promoter region interaction and co-transcriptional m6A modification. Following the reviewer’s suggestion, we need to further clarify the phosphorylation of WTAP phase separation. We plan to repeat the experiments by introducing potent PP4 inhibitor, fostriecin, and performed further experiments to explore the effect of WTAP IDR domain, which is reported to play a critical role for its phase separation.

      1,6-hex was initially considered as the inhibitor of hydrophobic interaction which involved in various kinds of protein-protein interaction, indicating that off-target effects of 1,6-hex was inevitable. It is reported that 1,6-hex impaired RNA pol II CTD specific phosphatase and kinase activity at 5% concentration3. However, 1,6-hex is still widely used in the LLPS-associated functional studies despite its off-target effect. Related to this article, 10% 1,6-hex was reported to dissolve WTAP phase separation droplets2. Beside WTAP, 1,6-hex (5%-10% w/v) was also used to explore the phase separation characteristic and function on phosphorylated protein or even kinase, including p‐tau441, TAZ, HSF1 and so on4-6. 10% 1,6-hex inhibited the crucial role of phosphorylation-driven HSF1 LLPS in chromatin binding and transcriptional process presented by RNA-seq dataset6, indicating the function on kinase or phosphatase of 1,6-hex might not a global effect. To avoid the 1,6-hex-mediated kinase/phosphatase impairment in this project, we introduced the WTAP SLPS mutation and LLPS mutation besides 1,6-hex treatment to explore the m6A modification function of WTAP phase transition. We plan to repeat the experiments by lower the 1,6-hex concentration, check the WTAP phosphorylation status after 1,6-hex treatment, and discuss them in the discussion part.

      A considerable number of proteins undergo phase separation via interactions between intrinsically disordered regions (IDRs). IDR contains more charged and polar amino acids to present multiple weakly interacting elements, while lacking hydrophobic amino acids to show flexible conformations7. In our article, we used PLAAC websites (http://plaac.wi.mit.edu/) to predict IDR domain of WTAP, and a fragment (234-249 amino acids) was predicted as prion-like domain. However, deletion of this fragment failed to abolish the phase separation properties of WTAP, which might be the main confusion to reviewers. To explain this issue, we checked the WTAP structure (within part of MTC complex) from protein data bank (https://www.rcsb.org/structure/7VF2) and found that prediction of IDR has been renewed due to the update of different algorithm. IDR of WTAP has expanded to 245-396 amino acids, containing the whole CTD region. According to our results, lack of CTD inhibited WTAP liquid-liquid phase separation both in vitro and in cells, while the phosphorylation status on CTD had dramatic impact on WTAP phase transition, which was consistent with the LLPS-regulating function of IDR. Therefore, we will revise our description on WTAP IDR, and performed further experiment to test its function.

      Taken together, given the highly association between WTAP phosphorylation with phase separation status and its function during IFN-β stimulation, it is necessary to involve WTAP phase separation in our mechanism. We will perform further experiments to propose more convincing evidence and perfect our project.

      Reviewer #2 (Public review):

      In this study, Cai and colleagues investigate how one component of the m6A methyltransferase complex, the WTAP protein, responds to IFNb stimulation. They find that viral infection or IFNb stimulation induces the transition of WTAP from aggregates to liquid droplets through dephosphorylation by PPP4. This process affects the m6A modification levels of ISG mRNAs and modulates their stability. In addition, the WTAP droplets interact with the transcription factor STAT1 to recruit the methyltransferase complex to ISG promoters and enhance m6A modification during transcription. The investigation dives into a previously unexplored area of how viral infection or IFNb stimulation affects m6A modification on ISGs. The observation that WTAP undergoes a phase transition is significant in our understanding of the mechanisms underlying m6A's function in immunity. However, there are still key gaps that should be addressed to fully accept the model presented.

      Major points:

      (1) More detailed analyses on the effects of WTAP sgRNA on the m6A modification of ISGs:

      a. A comprehensive summary of the ISGs, including the percentage of ISGs that are m6A-modified. merip-isg percentage

      b. The distribution of m6A modification across the ISGs. topology

      c. A comparison of the m6A modification distribution in ISGs with non-ISGs. topology

      In addition, since the authors propose a novel mechanism where the interaction between phosphorylated STAT1 and WTAP directs the MTC to the promoter regions of ISGs to facilitate co-transcriptional m6A modification, it is critical to analyze whether the m6A modification distribution holds true in the data.

      We appreciate the reviewer‘s summary of our manuscript and the constructive assessment. We plan to perform the related analysis accordingly to present the m6A modification in ISGs in our model. 

      (2) Since a key part of the model includes the cytosol-localized STAT1 protein undergoing phosphorylation to translocate to the nucleus to mediate gene expression, the authors should focus on the interaction between phosphorylated STAT1 and WTAP in Figure 4, rather than the unphosphorylated STAT1. Only phosphorylated STAT1 localizes to the nucleus, so the presence of pSTAT1 in the immunoprecipitate is critical for establishing a functional link between STAT1 activation and its interaction with WTAP.

      We plan to repeat the immunoprecipitation experiments to clarify the function of pSTAT1 in WTAP interaction and m6A modification as the reviewer suggested.

      (3) The authors should include pSTAT1 ChIP-seq and WTAP ChIP-seq on IFNb-treated samples in Figure 5 to allow for a comprehensive and unbiased genomic analysis for comparing the overlaps of peaks from both ChIP-seq datasets. These results should further support their hypothesis that WTAP interacts with pSTAT1 to enhance m6A modifications on ISGs.

      We first performed the MeRIP-seq and RNA-seq and explored the critical role of WTAP in ISGs m6A modification and expression. By immunoprecipitation and immunofluorescence experiments, we found phase transition of WTAP enhanced its interaction to pSTAT1. These results indicate that WTAP mediated ISGs m6A modification and expression by enhanced its interaction with pSTAT1 during virus infection and IFN-β stimulation. However, we were still not sure how WTAP-mediated m6A modification related to pSTAT1-mediated transcription. By analyzing METTL3 ChIP-seq data or caPAR-CLIP-seq data, several researches have revealed the recruitment of m6A methylation complex (MTC) to transcription start sites (TSS) of coding genes and R-loop structure by interacting with transcriptional factors STAT5B or DNA helicase DDX21, indicating the engagement of MTC mediated m6A modification on nascent transcripts at the very beginning of transcription 8-10. Thus, we proposed that phase transition of WTAP could be recruited to the ISGs promoter region by pSTAT1, and verified this hypothesis by pSTAT1/WTAP-ChIP-qPCR. We believe ChIP-seq experiment is a good idea to explore the mechanism in depth, but the results in this article for now are enough to explain our mechanism. We will continuously focus on the whole genome chromatin distribution of WTAP and explore more functional effect of transcriptional factor-dependent WTAP-promoter region interaction in t.

      Minor points:

      (1) Since IFNb is primarily known for modulating biological processes through gene transcription, it would be informative if the authors discussed the mechanism of how IFNb would induce the interaction between WTAP and PPP4.

      (2) The authors should include mCherry alone controls in Figure 1D to demonstrate that mCherry does not contribute to the phase separation of WTAP. Does mCherry have or lack a PLD?

      (3) The authors should clarify the immunoprecipitation assays in the methods. For example, the labeling in Figure 2A suggests that antibodies against WTAP and pan-p were used for two immunoprecipitations. Is that accurate?

      (4) The authors should include overall m6A modification levels quantified of GFPsgRNA and WTAPsgRNA cells, either by mass spectrometry (preferably) or dot blot.

      We thank reviewer for raising these useful suggestions. We will perform related experiments and revised the manuscript carefully the as reviewer suggested.

      Reviewer #3 (Public review):

      Summary:

      This study presents a valuable finding on the mechanism used by WTAP to modulate the IFN-β stimulation. It describes the phase transition of WTAP driven by IFN-β-induced dephosphorylation. The evidence supporting the claims of the authors is solid, although major analysis and controls would strengthen the impact of the findings. Additionally, more attention to the figure design and to the text would help the reader to understand the major findings.

      Strength:

      The key finding is the revelation that WTAP undergoes phase separation during virus infection or IFN-β treatment. The authors conducted a series of precise experiments to uncover the mechanism behind WTAP phase separation and identified the regulatory role of 5 phosphorylation sites. They also succeeded in pinpointing the phosphatase involved.

      Weaknesses:

      However, as the authors acknowledge, it is already widely known in the field that IFN and viral infection regulate m6A mRNAs and ISGs. Therefore, a more detailed discussion could help the reader interpret the obtained findings in light of previous research.

      It is well-known that protein concentration drives phase separation events. Similarly, previous studies and some of the figures presented by the authors show an increase in WTAP expression upon IFN treatment. The authors do not discuss the contribution of WTAP expression levels to the phase separation event observed upon IFN treatment. Similarly, METTL3 and METTL14, as well as other proteins of the MTC are upregulated upon IFN treatment. How does the MTC protein concentration contribute to the observed phase separation event?

      How is PP4 related to the IFN signaling cascade?

      In general, it is very confusing to talk about WTAP KO as WTAPgRNA.

      We are grateful for the positive comments and the unbiased advice by reviewer. To interpret the findings in previous research, we will revise the manuscript carefully and preform more detailed discussion on ISGs m6A modification during virus infection or IFN stimulation. As previous reported, WTAP protein level will be induced by long time IFN-β stimulation or LPS stimulation, while LPS-induced WTAP expression promoted its phase separation ability2,11. Although there was no significant upregulation of WTAP expression level in our short time treatment, we hypothesized that WTAP phase separation will be promoted due to higher protein concentration after long time IFN stimulation, enhancing m6A modification deposition on ISGs mRNA, revealing a feedback loop between WTAP phase separation and m6A modification during specific stimulation. To discuss the effect of MTC protein concentration in our proposed event, we will perform immunoblotting experiments of MTC proteins and check the phase separation effect in different WTAP concentration.

      Protein phosphatase 4 (PP4) is a multi-subunit Ser/Thr phosphatase complex that participate in diverse cellular pathways including DDR, cell cycle progression, and apoptosis12. Protein phosphatase 4 catalytic subunit 4C (PPP4C) is one of the components of PP4 complex. Previous research showed that knockout of PPP4C enhanced IFN-β downstream signaling and gene expression, which was consistent with our findings that knockdown of PPP4C impaired WTAP-mediated m6A modification, enhanced the ISGs expression. Since there was no significant enhancement in PPP4C expression level during IFN-β stimulation in our results, we will consider to explore the post-translation modification that may influence the protein-protein interaction, such as ubiquitination.

      In this project, all the WTAP-deficient THP-1 cells were bulk cells treated with WTAPsgRNA, but not monoclonal knockout cells. We confirmed that WTAP expression was efficiently knockdown in WTAPsgRNA THP-1 cells, and the m6A modification level has been impaired, avoiding the compensatory effect on m6A modification by other possible proteins. Thus, we prefer to call it WTAPsgRNA THP-1 cells rather than WTAP KO THP-1 cells.  

      References

      (1) Raja, R., Wu, C., Bassoy, E.Y., Rubino, T.E., Jr., Utagawa, E.C., Magtibay, P.M., Butler, K.A., and Curtis, M. (2022). PP4 inhibition sensitizes ovarian cancer to NK cell-mediated cytotoxicity via STAT1 activation and inflammatory signaling. J Immunother Cancer 10. 10.1136/jitc-2022-005026.

      (2) Ge, Y., Chen, R., Ling, T., Liu, B., Huang, J., Cheng, Y., Lin, Y., Chen, H., Xie, X., Xia, G., et al. (2024). Elevated WTAP promotes hyperinflammation by increasing m6A modification in inflammatory disease models. J Clin Invest 134. 10.1172/JCI177932.

      (3) Duster, R., Kaltheuner, I.H., Schmitz, M., and Geyer, M. (2021). 1,6-Hexanediol, commonly used to dissolve liquid-liquid phase separated condensates, directly impairs kinase and phosphatase activities. J Biol Chem 296, 100260. 10.1016/j.jbc.2021.100260.

      (4) Wegmann, S., Eftekharzadeh, B., Tepper, K., Zoltowska, K.M., Bennett, R.E., Dujardin, S., Laskowski, P.R., MacKenzie, D., Kamath, T., Commins, C., et al. (2018). Tau protein liquid-liquid phase separation can initiate tau aggregation. The EMBO journal 37. 10.15252/embj.201798049.

      (5) Lu, Y., Wu, T., Gutman, O., Lu, H., Zhou, Q., Henis, Y.I., and Luo, K. (2020). Phase separation of TAZ compartmentalizes the transcription machinery to promote gene expression. Nat Cell Biol 22, 453-464. 10.1038/s41556-020-0485-0.

      (6) Zhang, H., Shao, S., Zeng, Y., Wang, X., Qin, Y., Ren, Q., Xiang, S., Wang, Y., Xiao, J., and Sun, Y. (2022). Reversible phase separation of HSF1 is required for an acute transcriptional response during heat shock. Nat Cell Biol 24, 340-352. 10.1038/s41556-022-00846-7.

      (7) Hou, S., Hu, J., Yu, Z., Li, D., Liu, C., and Zhang, Y. (2024). Machine learning predictor PSPire screens for phase-separating proteins lacking intrinsically disordered regions. Nat Commun 15, 2147. 10.1038/s41467-024-46445-y.

      (8) Hao, J.D., Liu, Q.L., Liu, M.X., Yang, X., Wang, L.M., Su, S.Y., Xiao, W., Zhang, M.Q., Zhang, Y.C., Zhang, L., et al. (2024). DDX21 mediates co-transcriptional RNA m(6)A modification to promote transcription termination and genome stability. Mol Cell 84, 1711-1726 e1711. 10.1016/j.molcel.2024.03.006.

      (9) Barbieri, I., Tzelepis, K., Pandolfini, L., Shi, J., Millan-Zambrano, G., Robson, S.C., Aspris, D., Migliori, V., Bannister, A.J., Han, N., et al. (2017). Promoter-bound METTL3 maintains myeloid leukaemia by m(6)A-dependent translation control. Nature 552, 126-131. 10.1038/nature24678.

      (10) Bhattarai, P.Y., Kim, G., Lim, S.C., and Choi, H.S. (2024). METTL3-STAT5B interaction facilitates the co-transcriptional m(6)A modification of mRNA to promote breast tumorigenesis. Cancer Lett 603, 217215. 10.1016/j.canlet.2024.217215.

      (11) Ge, Y., Ling, T., Wang, Y., Jia, X., Xie, X., Chen, R., Chen, S., Yuan, S., and Xu, A. (2021). Degradation of WTAP blocks antiviral responses by reducing the m(6) A levels of IRF3 and IFNAR1 mRNA. EMBO Rep 22, e52101. 10.15252/embr.202052101.

      (12) Dong, M.Z., Ouyang, Y.C., Gao, S.C., Ma, X.S., Hou, Y., Schatten, H., Wang, Z.B., and Sun, Q.Y. (2022). PPP4C facilitates homologous recombination DNA repair by dephosphorylating PLK1 during early embryo development. Development 149. 10.1242/dev.200351.

    2. eLife assessment

      This important study demonstrates that interferon beta stimulation induces WTAP transition from aggregates to liquid droplets, coordinating m6A modification of a subset of mRNAs that encode interferon-stimulated genes and restricting their expression. The evidence presented is solid, supported by microscopy, immunoprecipitations, m6A sequencing, and ChIP, to show that WTAP phosphorylation controls phase transition and its interaction with STAT1 and the methyltransferase complex.

    3. Reviewer #1 (Public review):

      Summary:

      This study puts forth the model that under IFN-B stimulation, liquid-phase WTAP coordinates with the transcription factor STAT1 to recruit MTC to the promoter region of interferon-stimulated genes (ISGs), mediating the installation of m6A on newly synthesized ISG mRNAs. This model is supported by strong evidence that the phosphorylation state of WTAP, regulated by PPP4, is regulated by IFN-B stimulation, and that this results in interactions between WTAP, the m6A methyltransferase complex, and STAT1, a transcription factor that mediates activation of ISGs. This was demonstrated via a combination of microscopy, immunoprecipitations, m6A sequencing, and ChIP. These experiments converge on a set of experiments that nicely demonstrate that IFN-B stimulation increases the interaction between WTAP, METTL3, and STAT1, that this interaction is lost with the knockdown of WTAP (even in the presence of IFN-B), and that this IFN-B stimulation also induces METTL3-ISG interactions.

      Strengths:

      The evidence for the IFN-B stimulated interaction between METTL3 and STAT1, mediated by WTAP, is quite strong. Removal of WTAP in this system seems to be sufficient to reduce these interactions and the concomitant m6A methylation of ISGs. The conclusion that the phosphorylation state of WTAP is important in this process is also quite well supported.

      Weaknesses:

      The evidence that the above mechanism is fundamentally driven by different phase-separated pools of WTAP (regulated by its phosphorylation state) is weaker. These experiments rely relatively heavily on the treatment of cells with 1,6-hexanediol, which has been shown to have some off-target effects on phosphatases and kinases (PMID 33814344). Given that the model invoked in this study depends on the phosphorylation (or lack thereof) of WTAP, this is a particularly relevant concern. Related to this point, it is also interesting (and potentially concerning for the proposed model) that the initial region of WTAP that was predicted to be disordered is in fact not the region that the authors demonstrate is important for the different phase-separated states. Taking all the data together, it is also not clear to me that one has to invoke phase separation in the proposed mechanism.

    4. Reviewer #2 (Public review):

      In this study, Cai and colleagues investigate how one component of the m6A methyltransferase complex, the WTAP protein, responds to IFNb stimulation. They find that viral infection or IFNb stimulation induces the transition of WTAP from aggregates to liquid droplets through dephosphorylation by PPP4. This process affects the m6A modification levels of ISG mRNAs and modulates their stability. In addition, the WTAP droplets interact with the transcription factor STAT1 to recruit the methyltransferase complex to ISG promoters and enhance m6A modification during transcription. The investigation dives into a previously unexplored area of how viral infection or IFNb stimulation affects m6A modification on ISGs. The observation that WTAP undergoes a phase transition is significant in our understanding of the mechanisms underlying m6A's function in immunity. However, there are still key gaps that should be addressed to fully accept the model presented.

      Major points:

      (1) More detailed analyses on the effects of WTAP sgRNA on the m6A modification of ISGs:<br /> a. A comprehensive summary of the ISGs, including the percentage of ISGs that are m6A-modified.<br /> b. The distribution of m6A modification across the ISGs.<br /> c. A comparison of the m6A modification distribution in ISGs with non-ISGs.

      In addition, since the authors propose a novel mechanism where the interaction between phosphorylated STAT1 and WTAP directs the MTC to the promoter regions of ISGs to facilitate co-transcriptional m6A modification, it is critical to analyze whether the m6A modification distribution holds true in the data.

      (2) Since a key part of the model includes the cytosol-localized STAT1 protein undergoing phosphorylation to translocate to the nucleus to mediate gene expression, the authors should focus on the interaction between phosphorylated STAT1 and WTAP in Figure 4, rather than the unphosphorylated STAT1. Only phosphorylated STAT1 localizes to the nucleus, so the presence of pSTAT1 in the immunoprecipitate is critical for establishing a functional link between STAT1 activation and its interaction with WTAP.

      (3) The authors should include pSTAT1 ChIP-seq and WTAP ChIP-seq on IFNb-treated samples in Figure 5 to allow for a comprehensive and unbiased genomic analysis for comparing the overlaps of peaks from both ChIP-seq datasets. These results should further support their hypothesis that WTAP interacts with pSTAT1 to enhance m6A modifications on ISGs.

      Minor points:

      (1) Since IFNb is primarily known for modulating biological processes through gene transcription, it would be informative if the authors discussed the mechanism of how IFNb would induce the interaction between WTAP and PPP4.

      (2) The authors should include mCherry alone controls in Figure 1D to demonstrate that mCherry does not contribute to the phase separation of WTAP. Does mCherry have or lack a PLD?

      (3) The authors should clarify the immunoprecipitation assays in the methods. For example, the labeling in Figure 2A suggests that antibodies against WTAP and pan-p were used for two immunoprecipitations. Is that accurate?

      (4) The authors should include overall m6A modification levels quantified of GFPsgRNA and WTAPsgRNA cells, either by mass spectrometry (preferably) or dot blot.

    5. Reviewer #3 (Public review):

      Summary:

      This study presents a valuable finding on the mechanism used by WTAP to modulate the IFN-β stimulation. It describes the phase transition of WTAP driven by IFN-β-induced dephosphorylation. The evidence supporting the claims of the authors is solid, although major analysis and controls would strengthen the impact of the findings. Additionally, more attention to the figure design and to the text would help the reader to understand the major findings.

      Strength:

      The key finding is the revelation that WTAP undergoes phase separation during virus infection or IFN-β treatment. The authors conducted a series of precise experiments to uncover the mechanism behind WTAP phase separation and identified the regulatory role of 5 phosphorylation sites. They also succeeded in pinpointing the phosphatase involved.

      Weaknesses:

      However, as the authors acknowledge, it is already widely known in the field that IFN and viral infection regulate m6A mRNAs and ISGs. Therefore, a more detailed discussion could help the reader interpret the obtained findings in light of previous research.

      It is well-known that protein concentration drives phase separation events. Similarly, previous studies and some of the figures presented by the authors show an increase in WTAP expression upon IFN treatment. The authors do not discuss the contribution of WTAP expression levels to the phase separation event observed upon IFN treatment. Similarly, METTL3 and METTL14, as well as other proteins of the MTC are upregulated upon IFN treatment. How does the MTC protein concentration contribute to the observed phase separation event?

      How is PP4 related to the IFN signaling cascade?

      In general, it is very confusing to talk about WTAP KO as WTAPgRNA.

    1. eLife assessment

      This valuable study confirms the association between the human leukocyte antigen (HLA)-II region and tuberculosis (TB) susceptibility in genetically admixed South African populations, specifically identifying a near-genome-wide significant association in the HLA-DPB1 gene, which originates from KhoeSan ancestry. Whilst some of the evidence supporting the association between the HLA-II region and TB susceptibility is solid, the analysis is incomplete and requires further work for the study to achieve its full value. The work will be of interest to those studying the genetic basis of tuberculosis susceptibility/infection resistance.

    2. Reviewer #2 (Public review):

      Summary:

      This manuscript is about using different analytical approaches to allow ancestry adjustments to GWAS analyses amongst admixed populations. This work is a follow-on from the recently published ITHGC multi-population GWAS (https://doi.org/10.7554/eLife.84394), with a focus on the admixed South African populations. Ancestry adjustment models detected a peak of SNPs in the class II HLA DPB1, distinct from the class II HLA DQA1 loci significant in the ITHGC analysis.

      Strengths:

      Excellent demonstration of GWAS analytical pipelines in highly admixed populations. Further confirmation of the importance of the HLA class II locus in genetic susceptibility to TB.

      Weaknesses:

      Limited novelty compared to the group's previous existing publications and the body of work linking HLA class II alleles with TB susceptibility in South Africa or other African populations. This work includes only ~100 new cases and controls from what has already been published. High-resolution HLA typing has detected significant signals in both the DQA1 and DPB1 regions identified by the larger ITHGC and in this GWAS analysis respectively (Chihab L et al. HLA. 2023 Feb; 101(2): 124-137).

      Despite the availability of strong methods for imputing HLA from GWAS data (Karnes J et Plos One 2017), the authors did not confirm with HLA typing the importance of their SNP peak in the class II region. This would have supported the importance of this ancestry adjustment versus prior ITHGC analysis.

      The populations consider active TB and healthy controls (from high-burden presumed exposed communities) and do not provide QFT or other data to identify latent TB infection.

      Important methodological points for clarification and for readers to be aware of when reading this paper:

      (1) One of the reasons cited for the lack of African ancestry-specific associations or suggestive peaks in the ITHGC study was the small African sample size. The current association test includes a larger African cohort and yields a near-genome-wide significant threshold in the HLA-DPB1 gene originating from the KhoeSan ancestry. The investigation is needed as to whether the increase in power is due to increased African samples and not necessarily the use of the LAAA model as stated on lines 295 and 296?

      (2) In line 256, the number of SNPs included in the LAAA analysis was 784,557 autosomal markers; the number of SNPs after quality control of the imputed dataset was 7,510,051 SNPs (line 142). It is not clear how or why ~90% of the SNPs were removed. This needs clarification.

      (3) The authors have used the significance threshold estimated by the STEAM p-value < 2.5x10-6 in the LAAA analysis. Grinde et al. (2019 implemented their significance threshold estimation approach tailored to admixture mapping (local ancestry (LA) model), where there is a reduction in testing burden. The authors should justify why this threshold would apply to the LAAA model (a joint genotype and ancestry approach).

      (4) Batch effect screening and correction (line 174) is a quality control check. This section is discussed after global and local ancestry inferences in the methods. Was this QC step conducted after the inferencing? If so, the authors should justify how the removed SNPs due to the batch effect did not affect the global and local ancestry inferences or should order the methods section correctly to avoid confusion.

    3. Reviewer #1 (Public review):

      Summary:

      The authors aimed to confirm the association between the human leukocyte antigen (HLA)-II region and tuberculosis (TB) susceptibility within admixed African populations. Building upon previous findings from the International Tuberculosis Host Genetics Consortium (ITHGC), this study sought to address the limitations of small sample size and the inclusion of admixed samples by employing the Local Ancestry Allelic Adjusted (LAAA) model, as well as identify TB susceptibility loci in an admixed South African cohort.

      Strengths:

      The major strengths of this study include the use of six TB case-control datasets collected over 30 years from diverse South African populations and ADMIXTURE for global ancestry inference. The former represents comprehensive dataset used in this study and the later ensures accurate determination of ancestral contributions. In addition, the identified association in the HLA-DPB1 gene shows near-genome-wide significance, enhancing the credibility of the findings.

      Weaknesses:

      The major weakness of this study includes insufficient significant discoveries and reliance on cross-validation. This study only identified one variant significantly associated with TB status, located in an intergenic region with an unclear link to TB susceptibility. Despite identifying multiple lead SNPs, no other variants reached the genome-wide significance threshold, limiting the overall impact of the findings. The absence of an independent validation cohort, with the study relying solely on cross-validation, is also a major limitation. This approach restricts the ability to independently confirm the findings and evaluate their robustness across different population samples.

      Appraisal:

      The authors successfully achieved their aims of confirming the association between the HLA-II region and TB susceptibility in admixed African populations. However, the limited number of significant discoveries, reliance on cross-validation, and insufficient discussion of model performance and SNP significance weaken the overall strength of the findings. Despite these limitations, the results support the conclusion that considering local ancestry is crucial in genetic studies of admixed populations.

      Impact:

      The innovative use of the LAAA model and the comprehensive dataset in this study make substantial contributions to the field of genetic epidemiology.

    1. eLife assessment

      This valuable study presents compelling evidence that a single member of the Ly49 gene family (Ly49a) provides sufficient inhibitory signaling to license NK cell activity when its H-2Dd ligand is present. There is also convincing evidence of the effect of Ly49a expression on in vitro killing and IFNgamma production. The use of the authors' system to investigate additional Ly49 receptors, such as Ly49c/i on the H2b background, could provide information on their relative contribution to NK cell licensing. Improvements to the presentation with respect to figure clarity and terminology would allow a better understanding of this complex system by non-experts.

    2. Reviewer #1 (Public review):

      Summary:

      The article by Piersma et al. aims to reduce the complex process of NK cell licensing to the action of a single inhibitory receptor for MHC class I. This is achieved using a mouse strain lacking all of the Ly49 receptors expressed by NK cells and inserting the Ly49a gene into the Ncr1 locus, leading to expression on the majority of NK cells.

      Strengths:

      The mouse model used represents a precise deletion of all NK-expressed genes within the Ly49 cluster. The re-introduction of the Ly49a gene into the Ncr1 locus allows expression by most NK cells. Convincing effects of Ly49a expression on in vitro activation and in vivo killing assay are shown.

      Weaknesses:

      The choice of Ly49a provides a clear picture of H-2Dd recognition by this Ly49. It would be valuable to perform additional studies investigating Ly49c and Ly49i receptors for H-2b. This is of interest because there are reports indicating that Ly49c may not be a functional receptor in B6 mice due to strong cis interactions.

      This work generates an excellent mouse model for the study of NK cell licensing by inhibitory Ly49s that will be useful for the community. It provides a platform whereby the functional activity of a single Ly49 can be assessed.

    3. Reviewer #2 (Public review):

      Piersma et al. continue to work on deciphering the role and function of Ly49 NK cell receptors. This manuscript shows that a single inhibitory Ly49 receptor is sufficient to license NK cells and eliminate MHC-I-deficient target cells in mice. In short, they refined the mouse model ∆Ly49-1 (Parikh et al., 2020) into the Ly49KO model in which all Ly49 genes are disrupted. Using this model, they confirmed that NK cells from Ly49KO mice cannot be licensed, produce lower levels of IFN-gamma, and cannot reject MHC-I-deficient cells. To study the effect of a single Ly49 receptor in the function of NK cells, the authors backcrossed Ly49KO mice to H-2Dd transgenic KODO (D8-KODO) Ly49A knock-in mice in which a single inhibitory Ly49A receptor that recognizes H-2Dd ligands is expressed. By doing so, they demonstrate that a single inhibitory Ly49 receptor expressed by all NK cells is sufficient for licensing and missing-self killing.

      While the results of the study are largely consistent with the conclusions, it is important to address some discrepancies. For instance, in the title of Figure 1, the authors state that NK cells in Ly49KO mice compared to WT mice have a less mature phenotype , which is not consistent with the corresponding text in the Results section (lines 170-171) that states there is no difference in maturation. These differences are not evident in Figure 1, panel D. It is crucial to acknowledge these inconsistencies to ensure a comprehensive understanding of the research findings.

      In the legend of Figure 2. the text related to panel C indicates the use of dyes to label the splenocytes, and CFSE, CTV, and CTFR were mentioned. However, only CTV and CTFR are shown on the plots and mentioned in the corresponding text in the Results section. Similarly, in the legend of Figure 4, which is related to panel C, the authors write that splenocytes were differentially labeled with CFSE and CTV as indicated; however, in Figure 4, C and the Results section text, there is no mention of CFSE.

      The authors should clarify why they assume that KLRG1 expression is influenced by the expression of inhibitory Ly49 receptors and not by manipulations on chromosome 6, where the genes for both KLRG1 and Ly49 receptors are located. However, a better explanation for the possible influence of other inhibitory NK cell receptors still needs to be included. In the study by Zhang et al. (doi: 10.1038/s41467-019-13032-5 the authors showed the synergized regulation of NK cell education by the NKG2A receptor and the specific Ly49 family members. Although in this study, Piersma and colleagues show the control of MHC-I deficient cells by Ly49A+ NKG2A-NK cells in Figure 4., this receptor is not mentioned in the Results or in the Discussion section, so its role in this story needs to be clarified. Therefore, the reader would benefit from more information regarding NKG2A receptor and NKG2A+/- populations in their results.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Piersma et al. successfully generated a mouse model with all Ly49 genes knocked out, resulting in the complete absence of Ly49 receptor expression on the cell surface. The absence of Ly49 expression led to the loss of NK cell education/licensing and consequently, a failure in responsiveness against missing-self target cells. The experimental work and findings are partially overlapping with the previous work by Zhang et al. (2019), who also performed knockout of the entire Ly49 locus in mice and demonstrated that loss of NK responsiveness was due to the removal of inhibitory, and not activating Ly49 genes. The authors demonstrate the restoration of NK cell licensing by knocking in a single Ly49 gene, Ly49A, in a mouse expressing the H-2Dd ligand for this receptor, which is a novel and important finding.

      Strengths:

      The authors established a novel mouse model enabling them to have a clean and thorough study on the function of Ly49 on NK cell licensing. Also, by knocking in a single Ly49, they were able to investigate the function of a given Ly49 receptor excluding the "contamination" of co-expression of any other Ly49 genes. Their idea and method were novel though the mouse model was somehow genetically similar to a previous study. The experiment design and data interpretation were logically clear and the evidence was solid.

      Weaknesses:

      The paper is very poorly written and confusing. The authors should be more accurate in the usage of terminology, provide more details on experimental procedures, and revise much of the text to improve clarity and coherence. A thorough revision aiming to clarify the paper would be helpful.

    1. eLife assessment

      This paper reports the synthesis of covalent inhibitors bearing a unique fragment as a protected covalent warhead for irreversible binding to histidine in carbonic anhydrase (CA) enzymes. These findings are important due to the broad utility of the approach for covalent drug discovery applications and could have long-term impacts on related covalent targeting approaches. The data convincingly support the main conclusions of the paper.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes the covalent interactions of small molecule inhibitors of carbonic anhydrase IX, utilizing a pre-cursor molecule capable of undergoing beta-elimination to form the vinyl sulfone and covalent warhead.

      Strengths:

      The use of a novel covalent pre-cursor molecule that undergoes beta-elimination to form the vinyl sulfone in situ. Sufficient structure-activity relationships across a number of leaving groups, as well as binding moieties that impact binding and dissociation constants.

      Overall, the paper is clearly written and provides sufficient data to support the hypothesis and observations. The findings and outcomes are significant for covalent drug discovery applications and could have long-term impacts on related covalent targeting approaches.

      Weaknesses:

      No major weaknesses were noted by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      The authors utilized a "ligand-first" targeted covalent inhibition approach to design potent inhibitors of carbonic anhydrase IX (CAIX) based on a known non-covalent primary sulfonamide scaffold. The novelty of their approach lies in their use of a protected pre(pro?)-vinylsulfone as a precursor to the common vinylsulfone covalent warhead to target a nonstandard His residue in the active site of CAIX. In addition to a biochemical assessment of their inhibitors, they showed that their compounds compete with a known probe on the surface of HeLa cells.

      Strengths:

      The authors use a protected warhead for what would typically be considered an "especially hot" or even "undevelopable" vinylsulfone electrophile. This would be the first report of doing so making it a novel targeted covalent inhibition approach specifically with vinylsulfones.

      The authors used a number of orthogonal biochemical and biophysical methods including intact MS, 2D NMR, x-ray crystallography, and an enzymatic stopped-flow setup to confirm the covalency of their compounds and even demonstrate that this novel pre-vinylsulfone is activated in the presence of CAIX. In addition, they included a number of compelling analogs of their inhibitors as negative controls that address hypotheses specific to the mechanism of activation and inhibition.

      The authors employed an assay that allows them to assess target engagement of their compounds with the target on the surface of cells and a fluorescent probe which is generally a critical tool to be used in tandem with phenotypic cellular assays.

      Weaknesses:

      While the authors show that the pre-vinyl moiety is shown biochemically to be transformed into the vinylsulfone, they do not show what the fate of this -SO2CH2CH2OCOR group is in a cellular context. Does the pre-vinylsulfone in fact need to be in the active site of CAIX on the surface of the cell to be activated or is the vinylsulfone revealed prior to target engagement?

      I appreciate the authors acknowledging the limitations of using an assay such as thermal shift to derive an apparent binding affinity, however, it is not entirely convincing and leaves a gap in our understanding of what is happening biochemically with these inhibitors, especially given the two-step inhibitory mechanism. It is very difficult to properly understand the activity of these inhibitors without a more comprehensive evaluation of kinact and Ki parameters. This can then bring into question how selective these compounds actually are for CAIX over other carbonic anhydrases.

      The authors did not provide any cellular data beyond target engagement with a previously characterized competitive fluorescent probe. It would be critical to know the cytotoxicity profile of these compounds or even how they affect the biology of interest regarding CAIX activity if the intention is to use these compounds in the future as chemical probes to assess CAIX activity in the context of tumor metastasis.

    4. Reviewer #3 (Public review):

      Summary:

      Targeted covalent inhibition of therapeutically relevant proteins is an attractive approach in drug development. This manuscript now reports a series of covalent inhibitors for human carbonic anhydrase (CA) isozymes (CAI, CAII, and CAIX, CAXIII) for irreversible binding to a critical histidine amino acid in the active site pocket. To support their findings, they included co-crystal structures of CAI, CAII, and CAIX in the presence of three such inhibitors. Mass spectrometry and enzymatic recovery assays validate these findings, and the results and cellular activity data are convincing.

      Strengths:

      The authors designed a series of covalent inhibitors and carefully selected non-covalent counterparts to make their findings about the selectivity of covalent inhibitors for CA isozymes quite convincing. The supportive X-ray crystallography and MS data are significant strengths. Their approach of targeted binding of the covalent inhibitors to histidine in CA isozyme may have broad utility for developing covalent inhibitors.

      Weaknesses:

      This reviewer did not find any significant weaknesses. However, I suggest several points in the recommendation for the authors' section for authors to consider.

    1. eLife assessment

      This study presents an important platform for mapping mutation effects onto higher-level protein structural information, addressing a significant gap in current research. While the work is ambitious and incorporates often-overlooked aspects of higher-order structure, the strength of the evidence supporting some results seems incomplete. The quaternary structure modeling appears to underestimate oligomeric proteins compared to previous studies, and the mutation analysis lacks crucial baseline information. Despite these limitations, the method has potential for broader applications and generalization to additional organisms, warranting further development and refinement.

    2. Reviewer #1 (Public review):

      Summary:

      This work presents a computational platform that integrates currently available experimental or precomputed datasets and/or state-of-the-art modeling methods to assemble a proteome structure from a given list of genes (representing a whole proteome of an organism, or some specific subset of interest). The main advancement is that the proteome structure contains not only the tertiary structure information (such as is provided by precomputed AlphaFold predicted proteomes) but also information about the quaternary structure. Adding quaternary structure information on the whole proteomes is a challenging problem (and the manuscript would benefit from a more comprehensive introduction section presenting these challenges). Importantly, this addition of quaternary structure information is likely to significantly improve any downstream modelling or prediction. This is because most proteins form either stable or transient complexes, and a significant proportion of proteins interacts with cellular structures such as the different biological membranes. These interactions provide important context for interpreting residue-level information, such as for example the fitness/functional effects of point mutations.

      Strengths:

      The main strength of this work is that it approaches the question of protein quaternary structure in a comprehensive way. Namely, in addition to oligomeric state, it also includes membrane and cellular localization. It also demonstrates how to use and combine the available experimental and precomputed modelling to achieve the same for any set of genes.

      Weaknesses:

      The feasibility of obtaining a similar dataset (of useful/informative size) for a more complex organism is not clear.

    3. Reviewer #2 (Public review):

      In this study, a methodology called QSPACE is developed and presented. It integrates structural information for a specific organism, here E. coli. The process entails the gathering of individual structures, including oligomeric information/stoichiometry, the incorporation of data on transmembrane regions, and the utilization of the resulting dataset for the analysis of mutation effects and the allocation of proteomes.

      This work aims high, setting an ambitious goal of modeling the quaternary structure of a proteome. The method could be applied to other organisms in the future and has value in that respect. At the same time, the work tries to cover (too?) much ground and some of the results/analyses don't measure up. There are indeed a number of shortcomings and/or inconsistencies in the results presented. The comments below will help improve the work and its usefulness.

      (1) It is described that "QSPACE then finds the 3D coordinate file (i.e. "structure") that best reflects the user-defined (input #2) multi-subunit protein assembly". What is meant by "best reflects"? What if two different structures with the same stoichiometry are available? Which one is picked?

      (2) There appears to be a significant under-estimation of oligomer formation: it is reported that "31% (1,334/4,309) of E. coli genes participate in 1,047 oligomeric complexes, 667 genes are annotated as monomers, and 2,308 genes are not included". However, it is generally observed that ~50% of E coli genes form homo-oligomers (see PMID 10940245 or more recently 38325366), and adding hetero-oligomers on top of that should increase the fraction of oligomers further. In that respect, the estimate forming the basis of this work (31% of genes participating in oligomeric complexes) seems incorrect. It is unclear why the authors did not identify more proteins as adopting a quaternary structure. It is generally hard to grasp details of the dataset, for example, the simple statistic of how many genes participate in homo- versus hetero- oligomer. Such information is partially presented in panels 2c & 2d, but it is very small and hard to see (I would suggest removing the structures of the ABC transporters to make space to present this with more detail).

      (3) There are a number of misleading statements/overstatements that I encourage the authors to revise. For example (not exhaustive):<br /> "to our knowledge this result is the most advanced genome-scale structural representation of the E. coli proteome and de facto represents a major advancement in genome annotation."<br /> "angstrom-level subcellular compartmentalization" - Can we really talk about sub-atomic precision when even side chains can move by several angstroms?<br /> "we provide a global accounting of all functionally important regions" - "all" is not justified<br /> "Incorporated into genome-scale models that compute protein expression" - what does that mean? There are gene expression & protein abundance datasets, why is the "compute" necessary?<br /> "Likewise, sequence-based prediction software (e.g., DeepTMHMM49) and structure-based prediction software (e.g., OPM50) are agnostic to membrane orientation and can also generate erroneous results" - what does "erroenous results" mean in this context? Those tools are not supposed to predict orientation.

      (4) What was the benchmark used to estimate the accuracy of orientation assignments?

      (5) It is not clear why structural information is required to calculate the volume taken up by different proteins across the proteome. For each protein, the expression level (copy number) is expected to have a significant effect, but I'm unsure of why oligomerization is considered key here. It will modulate the volume exclusion associated with interface contact areas, but isn't this negligible compared to other factors, in particular expression?

      (6) Models aiming at predicting deleterious effects of mutations typically use sequence conservation, but I do not see such information used in Figure 4. Assessing the added value of structural information should include such evolutionary information (residue-level sequence conservation) in the baseline.

      (7) The "proteome allocation" analysis is presented as an important result, but I did not find details of equations used to conduct this analysis. I assume that "proteome allocation" is based solely on expression, and that "cell volume" uses structural information on top of it. There is a significant difference between "proteome allocation" and "cell volume" as reflected in the proteomaps shown in panels 4e & 4f, but there is no explanation for it. Are the proteins' identities the same in these two panels? Were only proteins counted or was RNA considered as well? Clarifications are needed for RNA, for example, how were volumes calculated in structures containing RNAs? Datasets used to derive these maps should also be provided to enable reproducing them.

      (8) I did not see that the structures generated are available - they should be deposited on a permanent repository with a DOI.

    1. eLife assessment

      This study focuses on the role of a T-cell-specific receptor, ctla-4, in a new zebrafish model of IBD-like phenotype. Although implicated in IBD diseases, the function of ctla-4 has been hard to study in mice as the KO is lethal. Ctla-4 mutant zebrafish exhibited significant intestinal inflammation and dysbiosis, mirroring the pathology of inflammatory bowel disease (IBD) in mammals, providing a new valuable model to the field of IBD research. However, although many of the results are solid, the methods as provided are incomplete, without information on methods for many data panels.

    2. Reviewer #1 (Public review):

      "Unraveling the Role of Ctla-4 in Intestinal Immune Homeostasis: Insights from a novel Zebrafish Model of Inflammatory Bowel Disease" suggests the identification of the zebrafish homolog of ctla-4 and generates a 14bp deletion/early stop codon mutation that is viable. This mutant exhibits an IBD-like phenotype, including decreased intestinal length, abnormal intestinal folds, decreased goblet cells, abnormal cell junctions between epithelial cells, increased inflammation, and alterations in microbial diversity. Bulk and single-cell RNA-seq show upregulation of immune and inflammatory response genes in this mutant (especially in neutrophils, B cells, and macrophages) and downregulation of genes involved in adhesion and tight junctions in mutant enterocytes. The work suggests that the makeup of immune cells within the intestine is altered in these mutants, potentially due to changes in lymphocyte proliferation. Introduction of recombinant soluble Ctla-4-Ig to mutant zebrafish rescued body weight, histological phenotypes, and gene expression of several pro-inflammatory genes, suggesting a potential future therapeutic route.

      Strengths:

      - Generation of a useful new mutant.

      - The demonstration of an IBD-like phenotype in this mutant is extremely comprehensive.

      - Demonstrated gene expression differences provide mechanistic insight into how this mutation leads to IBD-like symptoms.

      - Demonstration of rescue with a soluble protein suggests exciting future therapeutic potential.

      - The manuscript is mostly well organized and well written.

      Weaknesses:

      - Given the sequence similarity between CTLA-4 and its related receptor CD28, and the difference in subcellular localization of this protein vs. human CTLA-4, some confusion remains about which gene is mutated in this manuscript (CD28 or CTLA-4/CD152).

      - Some conclusions made from scRNAseq data (e.g. increased apoptosis, changes in immune cell numbers) could potentially result from dissociation artifacts and would be stronger with validation staining.

      - The Methods section is woefully incomplete and describes fewer than half of the experiments performed in this manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to elucidate the role of Ctla-4 in maintaining intestinal immune homeostasis by using a novel Ctla-4-deficient zebrafish model. This study addresses the challenge of linking CTLA-4 to inflammatory bowel disease (IBD) due to the early lethality of CTLA-4 knockout mice. Four lines of evidence were shown to show that Ctla-4-deficient zebrafish exhibited hallmarks of IBD in mammals:<br /> (1) impaired epithelial integrity and infiltration of inflammatory cells;<br /> (2) enrichment of inflammation-related pathways and the imbalance between pro- and anti-inflammatory cytokines;<br /> (3) abnormal composition of immune cell populations; and<br /> (4) reduced diversity and altered microbiota composition. By employing various molecular and cellular analyses, the authors established ctla-4-deficient zebrafish as a convincing model of human IBD.

      Strengths:

      The characterization of the mutant phenotype is very thorough, from anatomical to histological and molecular levels. The finding effectively established ctla-4 mutants as a novel zebrafish model for investigating human IBD. Evidence from the histopathological and transcriptome analysis was very strong and supported a severe interruption of immune system homeostasis in the zebrafish intestine. Additional characterization using sCtla-4-Ig further probed the molecular mechanism of the inflammatory response and provided a potential treatment plan for targeting Ctla-4 in IBD models.

      Weaknesses:

      Since CTLA-4 is one of the most well-established immune checkpoint molecules, it is not clear whether the ctla-4 mutant zebrafish exhibits inflammatory phenotypes in other tissues than the intestine. Although the evidence for intestinal phenotypes is clear and similar to human IBD, it can be ambiguous whether the mutant is a specific model for IBD, or abnormal immune response in general.

      To probe the molecular mechanism of Ctla-4, the authors used a spectrum of antibodies that target Ctla-4 or its receptors. The phenotype assayed was lymphocyte proliferation, while it was the composition rather than the number of in immune cell number that was observed to be different in the scRNASeq assay. Although sCtla-4 has an effect of alleviating the IBD-like phenotypes, I found this explanation a bit oversimplified.

    4. Reviewer #3 (Public review):

      Summary:

      The current study on the mutant zebrafish for IBD modeling is worth trying. The author provided lots of evidence, including histopathological observation, gut microflora, as well as intestinal tissue or mucosa cells' transcriptomic data. The multi-omic study has demonstrated the enteritis pathology at multi levels in zebrafish model. However, poor writing of methods and insufficient discussion of current findings were the main defects.

      Strengths:

      The important immune checkpoint of Treg cells was knocked out in zebrafish, and the enteritis was found then. It could be a substitution of the mouse knockout model to investigate the molecular mechanism of gut disease.

      Weaknesses:

      (1) The use of the English language requires further editing.

      (2) The background of this study has not been introduced sufficiently.

      (3) The medical concepts were overstated for immune cell populations.

      (4) A lot of methods were not provided.

      (5) The age of fish varied a lot in this study.

      (6) The pathological index can't reflect the detailed changes in intestinal mucosa.

      (7) A lot of findings reflected by the current were not discussed.

      (8) The structuring of the text is poor and lacks good logic.

    1. eLife assessment

      Leafhoppers coat their body surface with nanoparticles, called brochosomes, which are an evolutionary innovation in this insect clade. The important paper adds significant evidence for the biological role of these structures consisting of a reflection effect of UV light as a defense against predatory spiders. Convincing support is provided for a new functional aspect of brochosomes, elucidating the emergence of the underlying genes and the principles of self-assembly of these biological nanoparticles.

    2. Reviewer #1 (Public review):

      Summary:

      Evading predation is of utmost importance for most animals and camouflage is one of the predominant mechanisms. Wu et al. set out to test the hypothesis of a unique camouflage system in leafhoppers. These animals coat themselves with brochosomes, which are spherical nanostructures that are produced in the Malpighian tubules and are distributed on the cuticle after eclosion. Based on previous findings on the reflectivity properties of brochosomes, the authors provide very good evidence that these nanostructures indeed reduce the reflectivity of the animals thereby reducing predation by jumping spiders. Further, they identify four proteins, which are essential for the proper development and function of brochosomes. In RNAi experiments, the regular brochosome structure is lost, the reflectivity reduced and the respective animals are prone to increased predation. Finally, the authors provide some phylogenetic sequence analyses and speculate about the evolution of these essential genes.

      Strengths:

      The study is very comprehensive including careful optical measurements, EM and TM analysis of the nanoparticles and their production line in the malphigian tubules, in vivo predation tests, and knock-down experiments to identify essential proteins. Indeed, the results are very convincingly in line with the starting hypothesis such that the study robustly assigns a new biological function to the brochosome coating system.

      A key strength of the study is that the biological relevance of the brochosome coating is convincingly shown by an in vivo predation test using a known predator from the same habitat.

      Another major step forward is an RNAi screen, which identified four proteins, which are essential for the brochosome structure (BSMs). After respective RNAi knock-downs, the brochosomes show curious malformations that are interesting in terms of the self-assembly of these nanostructures. The optical and in vivo predation tests provide excellent support for the model that the RNAi knock-down leads to a change of brochosomes structure, which reduces reflectivity, which in turn leads to a decrease of the antipredatory effect.

      Weaknesses:

      The reduction of reflectivity by aberrant brochosomes or after ageing is only around 10%. This may seem little to have an effect in real life. On the other hand, the in vivo predation tests confirm an influence. Hence, this is not a real weakness of the study - just a note to reconsider the wording for describing the degree of reflectivity.<br /> The single gene knockdowns seemed to lead to a very low penetrance of malformed brochosomes (Figure Supplement 3). Judging from the overview slides, less than 1% of brochosomes may have been affected. A quantification of regular versus abnormal particles in both, wildtype and RNAi treatments would have helped to exclude that the shown aberrant brochosomes did not just reflect a putative level of "normal" background defects. Of note, the quadruple knock-down of all BSMs seemed to lead to a high penetrance (Figure 4), which was already reflected in the microtubule production line. While the data shown are convincing, a quantification might strengthen the argument.

      While the RNAi effects seemed to be very specific to brochosomes and therefore very likely specific, an off-target control for RNAi was still missing. Finding the same/similar phenotype with a non-overlapping dsRNA fragment in one off-target experiment is usually considered required and sufficient. Further, the details of the targeted sequence will help future workers on the topic.

      The main weakness in the current manuscript may be the phylogenetic analysis and the model of how the genes evolved. Several aspects were not clearly or consistently stated such that I felt unsure about what the authors actually think. For instance: Are all the 4 BSMs related to each other or only BSM2 and 3? If so, not only BSM2 and 3 would be called "paralogs" but also the other BSMs. If they were all related, then a phylogenetic tree including all BSMs should be shown to visualize the relatedness (including the putative ancestral gene if that is the model of the authors). Actually, I was not sure about how the authors think about the emergence of the BSMs. Are they real orphan genes (i.e. not present outside the respective clade) or was there an ancestral gene that was duplicated and diverged to form the BSMs? Where in the phylogeny does the first of the BSMs or ancestral proteins emerge (is the gene found in Clastoptera arizonana the most ancestral one?)? Maybe, the evolution of the BSMs would have to be discussed individually for each gene as they show somewhat different patterns of emergence and loss (BSM4 present in all species, the others with different degrees of phylogenetic restriction). Related to these questions I remained unsure about some details in Figure 5. On what kind of analysis is the phylogeny based? Why are some species not colored, although they are located on the same branch as colored ones? What is the measure for homology values - % identity/similarity? The homology labels for Nephotetix cincticeps and N. virescens seem to be flipped: the latter is displayed with 100% identity for all genes with all proteins while the former should actually show this. As a consequence of these uncertainties, I could not fully follow the respective discussion and model for gene evolution.

      Conclusion:

      The authors successfully tested their hypothesis in a multidisciplinary approach and convincingly assigned a new biological function to the brochosomes system. The results fully support their claims - only the quantification of the penetrance in the RNAi experiments would be helpful to strengthen the point. The author's analysis of the evolution of BSM genes remained a bit vague and I remained unsure about their respective conclusions.

      The work is a very interesting study case of the evolutionary emergence of a new system to evade predators. Based on this study, the function of the BSM genes could now be studied in other species to provide insights into putative ancestral functions. Further, studying the self-assembly of such highly regular complex nano-structures will be strongly fostered by the identification of the four key structural genes.

    3. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors investigate the optical properties of brochosomes produced by leafhoppers. They hypothesize that brochosomes reduce light reflection on the leafhopper's body surface, aiding in predator avoidance. Their hypothesis is supported by experiments involving jumping spiders. Additionally, the authors employ a variety of techniques including micro-UV-Vis spectroscopy, electron microscopy, transcriptome and proteome analysis, and bioassays. This study is highly interesting, and the experimental data is well-organized and logically presented.

      Strengths:

      The use of brochosomes as a camouflage coating has been hypothesized since 1936 (R.B. Swain, Entomol. News 47, 264-266, 1936) with evidence demonstrated by similar synthetic brochosome systems in a number of recent studies (S. Yang, et al. Nat. Commun. 8:1285, 2017; L. Wang, et al., PNAS. 121: e2312700121, 2024). However, direct biological evidence or relevant field studies have been lacking to directly support the hypothesis that brochosomes are used for camouflage. This work provides the first biological evidence demonstrating that natural brochosomes can be used as a camouflage coating to reduce the leafhoppers' observability of their predators. The design of the experiments is novel.

      Weaknesses:

      (1) The observation that brochosome coatings become sparse after 25 days in both male and female leafhoppers, resulting in increased predation by jumping spiders, is intriguing. However, since leafhoppers consistently secrete and groom brochosomes, it would be beneficial to explore why brochosomes become significantly less dense after 25 days.

      (2) The authors demonstrate that brochosome coatings reduce UV (specular) reflection compared to surfaces without brochosomes, which can be attributed to the rough geometry of brochosomes as discussed in the literature. However, it would be valuable to investigate whether the proteins forming the brochosomes are also UV absorbing.

      (3) The experiments with jumping spiders show that brochosomes help leafhoppers avoid predators to some extent. It would be beneficial for the authors to elaborate on the exact mechanism behind this camouflage effect. Specifically, why does reduced UV reflection aid in predator avoidance? If predators are sensitive to UV light, how does the reduced UV reflectance specifically contribute to evasion?

      (4) An important reference regarding the moth-eye effect is missing. Please consider including the following paper: Clapham, P. B., and M. C. Hutley. "Reduction of lens reflection by the 'Moth Eye' principle." Nature 244: 281-282 (1973).

      (5) The introduction should be revised to accurately reflect the related contributions in literature. Specifically, the novelty of this work lies in the demonstration of the camouflage effect of brochosomes using jumping spiders, which is verified for the first time in leafhoppers. However, the proposed use of brochosome powder for camouflage was first described by R.B. Swain (R.B. Swain, Notes on the oviposition and life history of the leafhopper Oncometopta undata Fabr. (Homoptera: Cicadellidae), Entomol. News. 47: 264-266 (1936)). Recently, the antireflective and potential camouflage functions of brochosomes were further studied by Yang et al. based on synthetic brochosomes and simulated vision techniques (S. Yang, et al. "Ultra-antireflective synthetic brochosomes." Nature Communications 8: 1285 (2017)). Later, Lei et al. demonstrated the antireflective properties of natural brochosomes in 2020 (C.-W. Lei, et al., "Leafhopper wing-inspired broadband omnidirectional antireflective embroidered ball-like structure arrays using a nonlithography-based methodology." Langmuir 36: 5296-5302 (2020)). Very recently, Wang et al. successfully fabricated synthetic brochosomes with precise geometry akin to those natural ones, and further elucidated the antireflective mechanisms based on the brochosome geometry and their role in reducing the observability of leafhoppers to their predators (L. Wang et al. "Geometric design of antireflective leafhopper brochosomes." Proceedings of the National Academy of Sciences 121: e2312700121 (2024))

    1. eLife assessment

      This paper reports a novel mechanism of regulation of the heat shock response in plants that acts as a brake to prevent hyperactivation of the stress response. The findings are valuable to understand and potentially manipulate the plant's response to heat stress and the presented evidence is overall solid. However, in some cases, the data are either poorly presented or insufficient to support the primary claims.

    2. Reviewer #1 (Public review):

      In the present work, Chen et al. investigate the role of short heat shock factors (S-HSF), generated through alternative splicing, in the regulation of the heat shock response (HSR). The authors focus on S-HsfA2, an HSFA2 splice variant containing a truncated DNA-binding domain (tDBD) and a known transcriptional-repressor leucin-rich domain (LRD). The authors found a two-fold effect of S-HsfA2 on gene expression. On the one hand, the specific binding of S-HsfA2 to the heat-regulated element (HRE), a novel type of heat shock element (HSE), represses gene expression. This mechanism was also shown for other S-HSFs, including HsfA4c and HsfB1. On the other hand, S-HsfA2 is shown to interact with the canonical HsfA2, as well as with a handful of other HSFs, and this interaction prevents HsfA2 from activating gene expression. The authors also identified potential S-HsfA2 targets and selected one, HSP17.6B, to investigate the role of the truncated HSF in the HSR. They conclude that S-HsfA2-mediated transcriptional repression of HSP17.6B helps avoid hyperactivation of the HSR by counteracting the action of the canonical HsfA2.

      The manuscript is well written and the reported findings are, overall, solid. The described results are likely to open new avenues in the plant stress research field, as several new molecular players are identified. Chen et al. use a combination of appropriate approaches to address the scientific questions posed. However, in some cases, the data are inadequately presented or insufficient to fully support the claims made. As such, the manuscript would highly benefit from tackling the following issues:

      (1) While the authors report the survival phenotypes of several independent lines, thereby strengthening the conclusions drawn, they do not specify whether the presented percentages are averages of multiple replicates or if they correspond to a single repetition. The number of times the experiment was repeated should be reported. In addition, Figure 7c lacks the quantification of the hsp17.6b-1 mutant phenotype, which is the background of the knock-in lines. This is an essential control for this experiment.

      (2) In Figure 1c, the transcript levels of HsfA2 splice variants are not evident, as the authors only show the quantification of the truncated variant. Moreover, similar to the phenotypes discussed above, it is unclear whether the reported values are averages and, if so, what is the error associated with the measurements. This information could explain the differences observed in the rosette phenotypes of the S-HsfA2-KD lines. Similarly, the gene expression quantification presented in Figures 4 and 5, as well as the GUS protein quantification of Figure 3F, also lacks this crucial information.

      (3) The quality of the main figures is low, which in some cases prevents proper visualization of the data presented. This is particularly critical for the quantification of the phenotypes shown in Figure 1b and for the fluorescence images in Figures 4f and 5b. Also, Figure 9b lacks essential information describing the components of the performed experiments.

      (4) Mutants with low levels of S-HsfA2 yield smaller plants than the corresponding wild type. This appears contradictory, given that the proposed role of this truncated HSF is to counteract the growth repression induced by the canonical HSF. What would be a plausible explanation for this observation? Was this phenomenon observed with any of the other tested S-HSFs?

      (5) In some cases, the authors make statements that are not supported by the results:<br /> (i) the claim that only the truncated variant expression is changed in the knock-down lines is not supported by Figure 1c;<br /> (ii) the increase in GUS signal in Figure 3a could also result from local protein production;<br /> (iii) in Figure 6b, the deletion of the HRE abolishes heat responsiveness, rather than merely altering the level of response; and<br /> (iv) the phenotypes in Figure 8b are not clear enough to conclude that HSP17.6B overexpressors exhibit a dwarf but heat-tolerant phenotype.

    3. Reviewer #2 (Public review):

      Summary:

      The authors report that Arabidopsis short HSFs S-HsfA2, S-HsfA4c, and S-HsfB1 confer extreme heat. They have truncated DNA binding domains that bind to a new heat-regulated element. Considering Short HSFA2, the authors have highlighted the molecular mechanism by which S-HSFs prevent HSR hyperactivation via negative regulation of HSP17.6B. The S-HsfA2 protein binds to the DNA binding domain of HsfA2, thus preventing its binding to HSEs, eventually attenuating HsfA2-activated HSP17.6B promoter activity. This report adds insights to our understanding of heat tolerance and plant growth.

      Strengths:

      (1) The manuscript represents ample experiments to support the claim.<br /> (2) The manuscript covers a robust number of experiments and provides specific figures and graphs in support of their claim.<br /> (3) The authors have chosen a topic to focus on stress tolerance in a changing environment.

      Weaknesses:

      (1) One s-HsfA2 represents all the other s-Hsfs; S-HsfA4c, and S-HsfB1. s-Hsfs can be functionally different. Regulation may be positive or negative. Maybe the other s-hsfs may positively regulate for height and be suppressed by the activity of other s-hsfs.

      (2) Previous reports on gene regulations by hsfs can highlight the mechanism.

      (3) The Materials and Methods section could be rearranged so that it is based on the correct flow of the procedure performed by the authors.

      (4) Graphical representation could explain the days after sowing data, to provide information regarding plant growth.

      (5) Clear images concerning GFP and RFP data could be used.

    1. eLife assessment

      This important study reveals a novel mechanism by which hypoxia-ischemia damages the neonatal brain and how hypothermia protects from brain injury. The paper presents an interesting combination of state-of-the-art optical measurements, mitochondrial assays, and the use of various control experiments providing solid evidence for the derived conclusions. Reviewers caution that possible adverse effects of prolonged anesthesia, as well as pain and stress after a major surgical procedure might influence the outcomes and should be carefully considered. This work will be of interest to the fields of hypoxia and brain metabolism research.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important problem of the uncoupling of oxidative phosphorylation due to hypoxia-ischemia injury of the neonatal brain and provides insight into the neuroprotective mechanisms of hypothermia treatment.

      Strengths:

      The authors used a combination of in vivo imaging of awake P10 mice and experiments on isolated mitochondria to assess various key parameters of the brain metabolism during hypoxia-ischemia with and without hypothermia treatment. This unique approach resulted in a comprehensive data set that provides solid evidence for the derived conclusions.

      Weaknesses:

      (1) The experiments were performed acutely on the same day when the surgery was performed. There is a possibility that the physiology of mice at the time of imaging was still affected by the previously applied anesthesia. This is particularly of concern since the duration of anesthesia was relatively long. Is it possible that the observed relatively low baseline OEF (~20%) and trends of increased OEF and CBF over several hours after the imaging start were partially due to slow recovery from prolonged anesthesia? The potential effects of long exposure to anesthesia before imaging experiments were not discussed.

      (2) The Methods Section does not provide information about drugs administered to reduce the pain. If pain was not managed, mice could be experiencing significant pain during experiments in the awake state after the surgery. Since the imaging sessions were long (my impression based on information from the manuscript is that imaging sessions were ~4 hours long or even longer), the level of pain was also likely to change during the experiments. It was not discussed how significant and potentially evolving pain during imaging sessions could have affected the measurements (e.g., blood flow and CMRO2). If mice received pain management during experiments, then it was not discussed if there are known effects of used drugs on CBF, CMRO2, and lesion size after 24 hr.

      (3) Animals were imaged in the awake state, but they were not previously trained for the imaging procedure with head restraint. Did animals receive any drugs to reduce stress? Our experience with well-trained young-adult as well as old mice is that they can typically endure 2 and sometimes up to 3 hours of head-restrained awake imaging with intermittent breaks for receiving the rewards before showing signs of anxiety. We do not have experience with imaging P10 mice in the awake state. Is it possible that P10 mice were significantly stressed during imaging and that their stress level changed during the imaging session? This concern about the potential effects of stress on the various measured parameters was not discussed.

      (4) The temperature of the skull was measured during the hypothermia experiment by lowering the water temperature in the water bath above the animal's head. Considering high metabolism and blood flow in the cortex, it could be challenging to predict cortical temperature based on the skull temperature, particularly in the deeper part of the cortex.

      (5) The map of estimated CMRO2 (Fig. 4B) looks very heterogeneous across the brain surface. Is it a coincidence that the highest CMRO2 is observed within the central part of the field of view? Is there previous evidence that CMRO2 in these parts of the mouse cortex could vary a few folds over a 1-2 mm distance?

      (6) The justification for using P10 mice in the experiments has not been well presented in the manuscript.

      (7) It was not discussed how the observations made in this manuscript could be affected by the potential discrepancy between the developmental stages of P10 mice and human babies regarding cellular metabolism and neurovascular coupling

    3. Reviewer #2 (Public review):

      Summary:

      In this study, authors have hypothesized that mitochondrial injury in HIE is caused by OXPHOS-uncoupling, which is the cause of secondary energy failure in HI. In addition, therapeutic hypothermia rescues secondary energy failure. The methodologies used are state-of-the art and include PAM technique in live animal , bioenergetic studies in the isolated mitochondria, and others.

      Strengths:

      The study is comprehensive and impressive. The article is well written and statistical analyses are appropriate.

      Weaknesses:

      (1) The manuscript does not discuss the limitation of this animal model study in view of the clinical scenario of neonatal hypoxia-ischemia.

      (2) I see many studies on Pubmed on bioenergetics and HI. Hence, it is unclear what is novel and what is known.

      (3) What are the limitations of ex-vivo mitochondrial studies?

      (4) PAM technique limits the resolution of the image beyond 500-750 micron depth. Assessing basal ganglia may not be possible with this approach.

      (5) Hypothermia in present study reduces the brain temperature from 37 to 29-32 degree centigrade. In clinical set up, head temp is reduced to 33-34.5 in neonatal hypoxia ischemia. Hence a drop in temperature to 29 degrees is much lower relative to the clinical practice. How the present study with greater drop in head temperature can be interpreted for understanding the pathophysiology of therapeutic hypothermia in neonatal HIE. Moreover, in HIE model using higher temperature of 37 and dropping to 29 seems to be much different than the clinical scenario. Please discuss.

      (6) NMR was assessed ex-vivo. How does it relate to in vivo assessment. Infants admitted in Neonatal intensive Care Unit, frequently get MRI with spectroscopy. How do the MRS findings in human newborns with HIE correlate with the ex-vivo evaluation of metabolites.

    4. Reviewer #3 (Public review):

      Sun et al. present a comprehensive study using a novel photoacoustic microscopy setup and mitochondrial analysis to investigate the impact of hypoxia-ischemia (HI) on brain metabolism and the protective role of therapeutic hypothermia. The authors elegantly demonstrate three connected findings: (1) HI initially suppresses brain metabolism, (2) subsequently triggers a metabolic surge linked to oxidative phosphorylation uncoupling and brain damage, and (3) therapeutic hypothermia mitigates HI-induced damage by blocking this surge and reducing mitochondrial stress.

      The study's design and execution are great, with a clear presentation of results and methods. Data is nicely presented, and methodological details are thorough.

      However, a minor concern is the extensive use of abbreviations, which can hinder readability. As all the abbreviations are introduced in the text, their overuse may render the text hard to read to non-specialist audiences. Additionally, sharing the custom Matlab and other software scripts online, particularly those used for blood vessel segmentation, would be a valuable resource for the scientific community. In addition, while the study focuses on the short-term effects of HI, exploring the long-term consequences and definitively elucidating HI's impact on mitochondria would further strengthen the manuscript's impact.

      Despite these minor points, this manuscript is very interesting.

    1. eLife assessment

      This important study provides a comprehensive assessment of mitochondrial function across age and sex in mice. The strength of evidence supporting this resource is compelling, given the exhaustive number of tissues profiled and in-depth analyses performed.

    2. Reviewer #1 (Public review):

      In this study, Sarver and colleagues carried out an exhaustive analysis of the functioning of various components (Complex I/II/IV) of the mitochondrial electron transport chain (ETC) using a real-time cell metabolic analysis technique (commonly referred as Seahorse oxygen consumption rate (OCR) assay). The authors aimed to generate an atlas of ETC function in about 3 dozen tissue types isolated from all major mammalian organ systems. They used a recently published improvised method by which ETC function can be quantified in freshly frozen tissues. This method enabled them to collect data from almost all organ systems from the same mouse and use many biological replicates (10 mice/experiment) required for an unbiased and statistically robust analysis. Moreover, they studied the influence of sex (male and female) and aging (young adult and old age) on ETC function in these organ systems. The main findings of this study are (1) cells in the heart and kidneys have very active ETC complexes compared to other organ systems, (2) the sex of the mice has little influence on the ETC function, and (3) aging undermined the mitochondrial function in most tissue, but surprisingly in some tissue aging promoted the activity of ETC complexes (e.g., Quadriceps, plantaris muscle, and Diaphragm).

      Comments on revised version:

      The revised manuscript has improved significantly, addressing some of my previous concerns in the discussion. There is no doubt the method used to estimate the maximal uncoupled respiration rate in mitochondria across different organ systems and ages is excellent for getting an overview of the mitochondrial state. However, the correlation between the measured maximal respiration rate and the actual mitochondrial ATP production is still not adequately addressed. The authors could performed few straight forward experiments on freshly isolated mitochondria from 1-2 tissue samples of their choice to provide data linking maximal respiration rates with mitochondrial ATP production. Providing evidence that directly links maximal respiration rates with mitochondrial ATP production would help readers understand how mitochondrial function is affected in various tissues.

    3. Reviewer #2 (Public review):

      Summary:

      The authors utilize a new technique to measure mitochondrial respiration from frozen tissue extracts, which goes around the historical problem of purifying mitochondria prior to analysis, a process that requires a fair amount of time and cannot be easily scaled up.

      Strengths:

      A comprehensive analysis of mitochondrial respiration across tissues, sexes, and two different ages provides foundational knowledge needed in the field.

      Weaknesses:

      While many of the findings are mostly descriptive, this paper provides a large amount of data for the community and can be used as a reference for further studies. As the authors suggest, this is a new atlas of mitochondrial function in mouse. The inclusion of a middle aged time point and a slightly older young point (3-6 months) would be beneficial to the study.

    4. Reviewer #3 (Public review):

      The aim of the study was to map, a) whether different tissues exhibit different metabolic profiles (this is known already), what differences are found between female and male mice and how the profiles changes with age. In particular, the study recorded the activity of respirasomes, i.e. the concerted activity of mitochondrial respiratory complex chains consisting of CI+CIII2+CIV, CII+CIII2+CIV or CIV alone.

      The strength is certainly the atlas of oxidative metabolism in the whole mouse body, the inclusion of the two different sexes and the comparison between young and old mice. The measurement was performed on frozen tissue, which is possible as already shown (Acin-Perez et al, EMBO J, 2020).

      Weakness:

      The assay reveals the maximum capacity of enzyme activity, which is an artificial situation and may differ from in vivo respiration, as the authors themselves discuss. The material used was a very crude preparation of cells containing mitochondria and other cytosolic compounds and organelles. Thus, the conditions are not well defined and the respiratory chain activity was certainly uncoupled from ATP synthesis. Preparation of more pure mitochondria and testing for coupling would allow evaluation of additional parameters: P/O ratios, feedback mechanism, basal respiration, and ATP-coupled respiration, which reflect in vivo conditions much better. The discussion is rather descriptive and cautious and could lead to some speculations about what could cause the differences in respiration and also what consequences these could have, or what certain changes imply.<br /> Nevertheless, this study is an important step towards this kind of analysis.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Although this study provides a comprehensive outlook on the ETC function in various tissues, the main caveat is that it's too technical and descriptive. The authors didn't invest much effort in putting their findings in the context of the biological function of the tissue analyzed, i.e., some tissues might be more glycolytic than others and have low ETC activity.

      To better contextualize our results, we have added substantial amount of new information to the Discussion Section.

      Also, it is unclear what slight changes in the activity of one or the other ETC complex mean in terms of mitochondrial ATP production.

      Unfortunately, the method we used can only determine oxygen consumption rate through complex I (CI), CII, or CIV. It cannot tell us about ATP production. This method only measures maximal uncoupled respiration.

      Likely, these small changes reported do not affect the mitochondrial respiration.

      We are indeed looking at mitochondrial respiration. Some changes are more dramatic while others are much more modest. We are looking at the normal aging process across tissues (focusing on mitochondrial respiration) and not pathological states. As such, we expect many of the changes in mitochondrial respiration across tissues to be mild or relatively modest. After all, aging is slow and progressive. In fact, the variations we observed in mitochondrial respiration across tissues are consistent with the known heterogenous rate of aging across tissues.

      With such a detailed dataset, the study falls short of deriving more functionally relevant conclusions about the heterogeneity of mitochondrial function in various tissues. In the current format, the readers get lost in the large amount of data presented in a technical manner.

      We agree that the paper contains a large amount of information. In the revised manuscript, we did our best to contextualize our results by substantially expanding the Discussion Section.

      Also, it is highly recommended that all the raw data and the values be made available as an Excel sheet (or other user-friendly formats) as a resource to the community.

      We included all the data in two excel sheets (Figure 1 – data source 1; Figure 1 – data source 2). We presented them in such as way that it will be easy for other investigators to follow and re-use our dataset in their own studies for comparison.

      Major concerns

      (1) In this study, the authors used the method developed by Acin-Perez and colleagues (EMBO J, 2020) to analyze ETC complex activities in mitochondria derived from the snap-frozen tissue samples. However, the preservation of cellular/mitochondrial integrity in different types of tissues after being snap-frozen was not validated.

      All the samples are actually maximally preserved due to being snap frozen. Freezing the samples disrupts the mitochondria to produce membrane fragments. Subsequent thawing, mincing, and homogenization in a non-detergent based buffer (mannose-sucrose) ensures that all tissue samples are maximally disrupted into fragments which contain ETC units in various combinations. This allows the assay to give an accurate representation of maximal respiratory capacity given the ETC units present in a tissue sample.

      Since aging has been identified as the most important effector in this study, it is essential to validate how aging affects respiration in various fresh frozen tissues. Such analysis will ensure that the results presented are not due to the differential preservation of the mitochondrial respiration in the frozen tissue. In addition, such validations will further strengthen the conclusions and promote the broad usability of this "new" method.

      The reason we adopted this method is because it has been rigorously validated in the original publication (PMID: 32432379) and a subsequent methods paper (PMID: 33320426). The authors in the original paper benchmarked their frozen tissue method with freshly isolated mitochondria from the same set of tissues. Their work showed highly comparable mitochondrial respiration from frozen tissues and isolated mitochondria. For this reason, we did not repeat those validation studies.

      (2) In this study, the authors sampled the maximal activity of ETC complex I, II, and IV, but throughout the manuscript, they discussed the data in the context of mitochondrial function.

      We apologize that we did not make it clearer in our manuscript. We corrected this in our revised manuscript (the Discussion Section). Our method we measure respiration starting at Complex I (CI; via NADH), starting at CII (via succinate), or starting at CIV (using TMPD and ascorbate). Regardless of whether electrons (donated by the substrate) enter the respiratory chain through CI, CII or CIV, oxygen (as the final electron acceptor) is only consumed at CIV. Therefor, the method measures mitochondrial respiration and function through CI, CII, or CIV. This high-resolution respirometry analysis method is different from the classic enzymatic method of assessing CI, CII, or CIV activity individually; the enzymatic method does not actually measure oxygen consumption due to electrons flowing through the respiratory complexes.

      However, it is unclear how the changes in CI, CII, and CIV activity affect overall mitochondrial function (if at all) and how small changes seen in the maximal activity of one or more complexes affect the efficiency and efficacy of ATP production (OxPhos).

      Please see the preceding response to the previous question. The method is measuring mitochondrial respiration through CI, CII or CIV. The limitation of this method is that it is maximal uncoupled respiration; namely, mitochondrial respiration is not coupled to ATP synthesis since the measurements are not performed on intact mitochondria. As such, we cannot say anything about the efficiency and efficacy of ATP production. This will be an interesting future studies to further investigating tissue level variations of mitochondrial OXPHOS.

      The authors report huge variability between the activity of different complexes - in some tissues all three complexes (CI, CII, and CIV) and often in others, just one complex was affected. For example, as presented in Figure 4, there is no difference in CI activity in the hippocampus and cerebellum, but there is a slight change in CII and CIV activity. In contrast, in heart atria, there is a change in the activity of CI but not in CII and CIV. However, the authors still suggest that there is a significant difference in mitochondrial activity (e.g., "Old males showed a striking increase in mitochondrial activity via CI in the heart atria....reduced mitochondrial respiration in the brain cortex..." - Lines 5-7, Page 9). Until and unless a clear justification is provided, the authors should not make these broad claims on mitochondrial respiration based on small changes in the activity of one or more complexes (CI/CII/CIV). With such a data-heavy and descriptive study, it is confusing to track what is relevant and what is not for the functioning of mitochondria.

      We have attempted to address these issues in the revised Discussion section.

      (3) What do differences in the ETC complex CI, CII, and CIV activity in the same tissue mean? What role does the differential activity of these complexes (CI, CII, and CIV) play in mitochondrial function? What do changes in Oxphos mean for different tissues? Does that mean the tissue (cells involved) shift more towards glycolysis to derive their energy? In the best world, a few experiments related to the glycolytic state of the cells would have been ideal to solidify their finding further. The authors could have easily used ECAR measurements for some tissues to support their key conclusions.

      We have attempted to address these issues in the revised Discussion section. The frozen tissue method does not involve intact mitochondria. As such, the method cannot measure ECAR, which requires the presence of intact mitochondria.

      (4) The authors further analyzed parameters that significantly changed across their study (Figure 7, 98 data points analyzed). The main caveat of such analysis is that some tissue types would be represented three or even more times (due to changes in the activity of all three complexes - CI, CII, and CIV, and across different ages and sexes), and some just once. Such a method of analysis will skew the interpretation towards a few over-represented organ/tissue systems. Perhaps the authors should separately analyze tissue where all three complexes are affected from those with just one affected complex.

      Figure 7 summarizes the differences between male vs female, and between young vs old. All the tissue-by-tissue comparisons (data separated by CI-linked respiration, CII-linked respiration, and CIV-linked respiration) can be found in earlier figures (Figure 1-6).

      The focus of Figure 7 is to helps us better appreciate all the changes seen in the preceding Figure 1-6:

      Panel A and B indicate all changes that are considered significant

      Panel C indicates total tissues with at least one significantly affected respiration

      Panel D indicates total magnitude of change (i.e., which tissue has the highest OCR) offering a non-relative view

      Panel E indicates whole body separations

      Panel F indicates whole body separations and age vs sex clustering

      (5) The current protocol does not provide cell-type-specific resolution and will be unable to identify the cellular source of mitochondrial respiration. This becomes important, especially for those organ systems with tremendous cellular heterogeneity, such as the brain. The authors should discuss whether the observed changes result from an altered mitochondria respiratory capacity or if changes in proportions of cell types in the different conditions studied (young vs. aged) might also contribute to differential mitochondrial respiration.

      We agree with the reviewer that this is a limitation of the method. We have addressed this issue in the revised Discussion section.

      (6) Another critical concern of this study is that the same datasets were repeatedly analyzed and reanalyzed throughout the study with almost the same conclusion - namely, aging affects mitochondrial function, and sex-specific differences are limited to very few organs. Although this study has considerable potential, the authors missed the chance to add new insights into the distinct characteristics of mitochondrial activity in various tissue and organ systems. The author should invest significant efforts in putting their data in the context of mitochondrial function.

      We have attempted to address these issues in the revised Discussion section.

      Reviewer #2 (Public Review):

      Summary:

      The authors utilize a new technique to measure mitochondrial respiration from frozen tissue extracts, which goes around the historical problem of purifying mitochondria prior to analysis, a process that requires a fair amount of time and cannot be easily scaled up.

      Strengths:

      A comprehensive analysis of mitochondrial respiration across tissues, sexes, and two different ages provides foundational knowledge needed in the field.

      Weaknesses:

      While many of the findings are mostly descriptive, this paper provides a large amount of data for the community and can be used as a reference for further studies. As the authors suggest, this is a new atlas of mitochondrial function in mouse. The inclusion of a middle aged time point and a slightly older young point (3-6 months) would be beneficial to the study.

      We agreed with the reviewer that inclusion of additional time points (e.g., 3-6 months) would further strengthen the study. However, the cost, labor, and time associated with another set of samples (660 tissue samples from male and female mice and 1980 respirometry assays) are too high for our lab with limited budget and manpower. Regrettably, we will not be able to carry out the extra work as requested by the reviewer.  

      Reviewer #3 (Public Review):

      The aim of the study was to map, a) whether different tissues exhibit different metabolic profiles (this is known already), what differences are found between female and male mice and how the profiles changes with age. In particular, the study recorded the activity of respirasomes, i.e. the concerted activity of mitochondrial respiratory complex chains consisting of CI+CIII2+CIV, CII+CIII2+CIV or CIV alone.

      The strength is certainly the atlas of oxidative metabolism in the whole mouse body, the inclusion of the two different sexes and the comparison between young and old mice. The measurement was performed on frozen tissue, which is possible as already shown (Acin-Perez et al, EMBO J, 2020).

      Weakness:

      The assay reveals the maximum capacity of enzyme activity, which is an artificial situation and may differ from in vivo respiration, as the authors themselves discuss. The material used was a very crude preparation of cells containing mitochondria and other cytosolic compounds and organelles. Thus, the conditions are not well defined and the respiratory chain activity was certainly uncoupled from ATP synthesis. Preparation of more pure mitochondria and testing for coupling would allow evaluation of additional parameters: P/O ratios, feedback mechanism, basal respiration, and ATP-coupled respiration, which reflect in vivo conditions much better. The discussion is rather descriptive and cautious and could lead to some speculations about what could cause the differences in respiration and also what consequences these could have, or what certain changes imply.

      Nevertheless, this study is an important step towards this kind of analysis.

      We have attempted to address some of these issues in the revised Discussion Section. The frozen tissue method can only measure maximal uncoupled respiration. Because we are not measuring mitochondrial respiration using intact mitochondria, several of the functional parameters the reviewer alluded to (e.g., P/O ratios, feedback mechanism, basal respiration, and ATP-coupled respiration) simply cannot be obtained with the current set of samples. Nevertheless, we agree that all the additional data (if obtained) would be very informative.

      Reviewer #1 (Recommendations For The Authors):

      (1) For most of the comparative analysis, the authors normalized OCR/min to MitoTracker Deep RedFM (MTDR) fluorescence intensity. Why was the data normalized to the total protein content not used for comparative analysis? Is there a correlation between MTDR fluorescence and the protein content across different tissues?

      Given that we used the crude extract method, total protein content does not equal total mitochondrial protein content. This is why the MTDR method was used, as this represents a high throughput method of assessing mitochondrial mass in this volume of samples. In general, the total protein concentration is used to ensure the respiration intensity was approximately the same across all samples loaded into the Seahorse machine.

      (2) To test the mitochondrial isolation yield, the authors should run immunoblot against canonical mitochondrial proteins in both homogenates and mitochondrial-containing supernatants and show that the protocol followed effectively enriched mitochondria in the supernatant fraction. This would also strengthen the notion that the "µg protein" value used to normalize the total mitochondrial content comes from isolated mitochondria and not other extra-mitochondrial proteins.

      Because we are using crude tissue lysate (from frozen tissue), the total ug protein content does not come from isolated mitochondria; for this reason, it was not used and this is why MTDR was. Total mitochondrial protein content is subject to change depending on tissue for non-mitochondrial reasons. This method does not use isolated mitochondria; we only use tissue lysates enriched for mitochondrial proteins. This method has been rigorously validated in the original study (PMID: 32432379) and a subsequent methods paper (PMID: 33320426). In those studies, the authors had performed requisite quality checks the reviewer has asked for (e.g., immunoblot against canonical mitochondrial proteins in both homogenates and mitochondrial-containing supernatants to show effective enrichment of mitochondrial proteins). For this reason, we did not repeat this.

      (3) MitoTracker loads into mitochondria in a membrane potential-dependent manner. The authors should rule out the possibility that samples from different ages and sexes might have different mitochondrial membrane potentials and exhibit a differential MitoTracker loading capacity. This becomes relevant for data normalization based on MTDR (MTDR/µg protein) since it was assumed that loading capacity is the same for mitochondria across different tissue and age groups.

      MitoTracker Deep Red is not membrane potential dependent and can be effectively used to quantify mitochondrial mass even when mitochondrial membrane potential is lost. This is highlighted in the original study (PMID: 32432379).

      (4) Page 11, line 3 typo - across, not cross.

      Response: We have fixed the typo.

      Reviewer #2 (Recommendations For The Authors):

      If possible, I would include a middle aged time point between 12 and 14 months of age.

      We agreed with reviewer that inclusion of additional time points (e.g., 3-6 months) would further strengthen the study. However, the cost, labor, and time associated with another set of samples (660 tissue samples from male and female mice and 1980 respirometry assays) are too high for our lab with limited budget and manpower. Regrettably, we will not be able to carry out the extra work as requested by the reviewer. 

      Reviewer #3 (Recommendations For The Authors):

      Overall, the work is well done and the data are well processed making them easy to understand. Some minor adjustments would improve the manuscript further:

      - Significance OCR in Figure 2, maybe add error bars?

      We have added the error bars and statistical significance to revised Figure 2.

      - Tissue comparison A-C, right panel: graphs are cropped

      We are not sure what the reviewer meant here. We have double checked all our revised figures to make sure nothing is accidentally cropped.

      - Heart ventricle: Old males and females have higher CI- and CII-dependent respiration than young males and females? Only CIV respiration is lower?

      Comparing old to young male or female heart ventricle respiration via CI or CII shows an increase in maximal capacity with age. CIV-linked respiration is in the upward direction as well, although not significant, when comparing old to young. When comparing the respiration values among themselves within a mouse, i.e. old male CI- or CII-linked respiration compared to old male CIV- linked respiration, we can see that the old male CIV-linked respiration is very similar. When comparing the same in the old female mouse, there appears to be something special about electrons entering through CI as compared to CII or CIV, as CI-linked respiration appears to be elevated compared to both CII and CIV. Although we do not know if this is significantly different, the trend in the data is clear. We do not know the exact reason as to why this occurred in the heart ventricles. To differing degrees, the connected nature of CI-, CII-, and CIV-linked respirations seems to be in a generally similar style in most skeletal muscles as well, and the old male heart atria. Again, the root of this discrepancy is unknown and potentially indicates an interesting physiologic trait of certain types of muscle and merits further exploration.

      - What is plotted in Fig.3: The mean of all OCR of all tissues? A,B,C: Plot with break in x-axis to expand the violin, add mean/median values as numbers to the graph (same for Fig4)

      The left most side of Figure 3 A, B, and C shows the average OCR/MTDR value across all tissues in a group. Each tissue assayed is represented in the violin plot as an open circle.

      - Fig. 3D: add YM/YF to graph for better understanding, same in following figures

      This is in the scale bar next to all heat maps presented in the figures. We also added to the revised figure as well to improve clarity.

      - Additional figures: x-axis title (time) is missing in OCR graphs

      Time has been added to the x axis of all additional figures for clarity.

      - Also a more general question is: where the concentrations of substrates and inhibitors optimized before starting the series of experiments?

      All the details of assay optimization was carried out in the original study (PMID: 32432379) and the subsequent methods paper (PMID: 33320426). Because we had to survey 33 different tissues, we tested and optimized the “optimal” protein concentrations we need to use; the primary goal of this was to balance enough respiration signal without too much respiration signal across all tissue types as to keep all the diverse tissues analyzed under the Seahorse machine’s capabilities of detection. Through our optimization of mostly the very high respiring tissues like heart and kidney, we were also able to prove that all substrates and inhibitors were in saturating concentrations since we could get respiration to go higher if more sample was added and that all signal could be lost in these samples with the same amount of inhibitors.

    1. eLife assessment

      This study offers a valuable description of the layer-and sublayer specific outputs of the somatosensory cortex based on compelling evidence obtained with modern tools for the analysis of brain connectivity, together with functional validation of the connectivity using optogenetic approaches in vivo. Beyond bridging together, in one dataset, the results of disparate studies, this effort brings new insights on layer specific outputs, and on differences between primary and secondary somatosensory areas. This study will be of interest to neuroanatomists and neurophysiologists.

    2. Reviewer #1 (Public review):

      Summary:

      This is a fine paper that serves the purpose to show that the use of light sheet imaging may be used to provide whole brain imaging of axonal projections. The data provided suggest that at this point the technique provides lower resolution than with other techniques. Nonetheless, the technique does provide useful, if not novel, information about particular brain systems.

      Strengths:

      The manuscript is well written. In the introduction a clear description of the functional organization of the barrel cortex is provided provides the context for applying the use of specific Cre-driver lines to map the projections of the main cortical projection types using whole brain neuroanatomical tracing techniques. The results provided are also well written, with sufficient detail describing the specifics of how techniques were used to obtain relevant data. Appropriate controls were done, including the identification of whisker fields for viral injections and determination of the laminar pattern of Cre expression. The mapping of the data provides a good way to visualize low resolution patterns of projections.

      Weaknesses:

      (1) The results provided are, as stated in the discussion, "largely in agreement with previously reported studies of the major projection targets". However it must be stated that the study does not "extend current knowledge through the high sensitivity for detecting sparse axons, the high specificity of labeling of genetically defined classes of neurons and the brain wide analysis for assigning axons to detailed brain regions" which have all been published in numerous other studies. ( the allen connectivity project and related papers, along with others). If anything the labeling of axons obtained with light sheet imaging in this study does not provide as detailed mapping obtained with other techniques. Some detail is provided of how the raw images are processed to resolve labeled axons, but the images shown in the figures do not demonstrate how well individual axons may be resolved, of particular interest would be to see labeling in terminal areas such as other cortical areas, striatum and thalamus. As presented the light sheet imaging appears to be rather low resolution compared to the many studies that have used viral tracing to look at cortical projections from genetically identified cortical neurons.<br /> (2) Amongst the limitations of this study is the inability to resolve axons of passage and terminal fields. This has been done in other studies with viral constructs labeling synaptophysin. This should be mentioned.<br /> (3) Figure 5 is an example of the type of large sets of data that can be generated with whole brain mapping and registration to the Allen CCF that provides information of questionable value. Ordering the 50 plus structures by the density of labeling does not provide much in terms of relative input to different types of areas. There are multiple subregions for different functional types ( ie, different visual areas and different motor subregions are scattered not grouped together. Makes it difficult to understand any organizing principles.<br /> (4) The GENSAT Cre driver lines used must have the specific line name used, not just the gene name as the GENSAT BAC-Cre lines had multiple lines for each gene and often with very different expression patterns. Rbp4_KL100, Tlx3_PL56, Sim1_KJ18, Ntsr1_ GN220.

    3. Reviewer #2 (Public review):

      Summary:

      This study takes advantage of multiple methodological advances to perform layer-specific staining of cortical neurons and tracking of their axons to identify the pattern of their projections. This publication offers a mesoscale view of the projection patterns of neurons in the whisker primary and secondary somatosensory cortex. The authors report that, consistent with the literature, the pattern of projection is highly different across cortical layers and subtype, with targets being located around the whole brain. This was tested across 6 different mouse types that expressed a marker in layer 2/3, layer 4, layers 5 (3 sub-types) and layer 6.

      Looking more closely to the projections from primary somatosensory cortex into the primary motor cortex, they found that there was a significant spatial clustering of projections from topographically separated neurons across the primary somatosensory cortex. This was true for neurons with cell bodies located across all tested layers/types.

      Strengths:

      This study successfully looks at the relevant scale to study projection patterns, which is the whole brain. This is acheived thanks to an ambitious combination of mouse lines, immuno-histochemistry, imaging and image processing, which results in a standardized histological pipeline that processes the whole-brain projection patterns of layer-selected neurons of the primary and secondary somatosensory cortex.<br /> This standardization means that comparisons between cell-types projection patterns are possible and that both the large scale structure of the pattern and the minute details of the intra-areas pattern are available.<br /> This reference dataset and the corresponding analysis code are made available to the research community.

      Weaknesses:

      One major question raised by this dataset is the risk of missing axons during the post-processing step. Following the previous review round, my concerns have been addressed regarding this point.

    4. Reviewer #3 (Public review):

      Summary:

      The paper offers a systematic and rigorous description of the layer-and sublayer specific outputs of the somatosensory cortex using a modern toolbox for the analysis of brain connectivity which combines: 1) Layer-specific genetic drivers for conditional viral tracing; 2) whole brain analyses of axon tracts using tissue clearing and imaging; 3) Segmentation and quantification of axons with normalization to the number of transduced neurons; 4) registration of connectivity to a widely used anatomical reference atlas; 5) functional validation of the connectivity using optogenetic approaches in vivo.

      Strengths:

      Although the connectivity of the somatosensory cortex is already known, precise data are dispersed in different accounts (papers, online resources, ) using different methods. So the present account has the merit of condensing this information in one very precisely documented report. It also brings new insights on the connectivity, such as the precise comparison of layer specific outputs, and of the primary and secondary somatosensory areas. It also shows a topographic organization of the circuits linking the somatosensory and motor cortices. The paper also offers a clear description of the methodology and of a rigorous approach to quantitative anatomy.

      Weaknesses:

      The weakness relates to the intrinsic limitations of the in toto approaches, that currently lack the precision and resolution allowing to identify single axons, axon branching or synaptic connectivity. These limitations are identified and discussed by the authors.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This is a fine paper that serves the purpose to show that the use of light sheet imaging may be used to provide whole brain imaging of axonal projections. The data provided suggest that at this point the technique provides lower resolution than with other techniques. Nonetheless, the technique does provide useful, if not novel, information about particular brain systems. 

      Strengths: 

      The manuscript is well written. In the introduction a clear description of the functional organization of the barrel cortex is provided provides the context for applying the use of specific Cre-driver lines to map the projections of the main cortical projection types using whole brain neuroanatomical tracing techniques. The results provided are also well written, with sufficient detail describing the specifics of how techniques were used to obtain relevant data. Appropriate controls were done, including the identification of whisker fields for viral injections and determination of the laminar pattern of Cre expression. The mapping of the data provides a good way to visualize low resolution patterns of projections. 

      Weaknesses: 

      (1) The results provided are, as stated in the discussion, "largely in agreement with previously reported studies of the major projection targets". However it must be stated that the study does not "extend current knowledge through the high sensitivity for detecting sparse axons, the high specificity of labeling of genetically defined classes of neurons and the brain wide analysis for assigning axons to detailed brain regions" which have all been published in numerous other studies. ( the allen connectivity project and related papers, along with others). If anything the labeling of axons obtained with light sheet imaging in this study does not provide as detailed mapping obtained with other techniques. Some detail is provided of how the raw images are processed to resolve labeled axons, but the images shown in the figures do not demonstrate how well individual axons may be resolved, of particular interest would be to see labeling in terminal areas such as other cortical areas, striatum and thalamus. As presented the light sheet imaging appears to be rather low resolution compared to the many studies that have used viral tracing to look at cortical projections from genetically identified cortical neurons. 

      We agree with the reviewer that the resolution of imaging should be further improved in future studies, as also mentioned in the original manuscript. On P. 17 of the revised manuscript we write “Probably most important for future studies is the need to increase the light-sheet imaging resolution perhaps combined with the use of expansion microscopy to provide brain-wide micron-resolution data (Glaser et al., 2023; Wassie et al., 2019).” However, even at somewhat lower resolution, through bright sparse labelling, individual axonal segments can nonetheless be traced through machine learning to define axonal skeletons, whose length can be quantified as we do in this study. This methodology highlights sparse wS1 and wS2 innervation of a large number of brain areas, some of which are not typically considered, and our anatomical results might therefore help the neuronal circuit analysis underlying various aspects of whisker sensorimotor processing. Despite impressive large-scale projection mapping projects such as the Allen connectivity atlas, there remains relatively sparse cell typespecific projection map data for the representations of the large posterior whiskers in wS1 and wS2, and our data in this study thus adds to a growing body of cell-type specific projection mapping with the specific focus on the output connectivity of these whisker-related neocortical regions of sensory cortex.

      In the revised manuscript, we now provide an additional supplementary figure (Figure 1 – figure supplement 2) showing examples of the axonal segmentation from further additional image planes including branching axons in the key innervation regions mentioned by the reviewer, namely “other cortical areas, striatum and thalamus”.

      (2) Amongst the limitations of this study is the inability to resolve axons of passage and terminal fields. This has been done in other studies with viral constructs labeling synaptophysin. This should be mentioned. 

      The reviewer brings up another important point for future methodological improvements to enhance connectivity mapping. Indeed, we already mentioned this in our original submission near the end of the first paragraph under the Limitations and future perspectives section. In the revised manuscript on P. 17, we write “Future studies should also aim to identify neurotransmitter release sites along the axon, which could be achieved by fluorescent labeling of prominent synaptic components, such as synaptophysin-GFP (Li et al., 2010).”

      (3) There is no quantitative analysis of differences between the genetically defined neurons projecting to the striatum, what is the relative area innervated by, density of terminals, other measures. 

      The reviewer raises an interesting question, and in the revised manuscript, we now present a more detailed analysis of cell class-specific axonal projections focusing specifically on the striatum. Following the reviewer’s suggestion, in a new supplementary figure (Figure 7 – figure supplement 1), we now report spatial axonal density maps in the striatum from SSp-bfd and SSs, finding potentially interesting differences comparing the projections of Rasgrf2-L2/3, Scnn1a-L4 and Tlx3-L5IT neurons. On P. 12 of the revised manuscript, we now write “We also investigated the spatial innervation pattern of Rasgrf2-L2/3, Scnn1a-L4 and Tlx3-L5IT neurons in the striatum (Figure 7 – figure supplement 1), where we found that axonal density from Rasgrf2-L2/3 neurons in both SSp-bfd and SSs was concentrated in a posterior dorsolateral part of the ipsilateral striatum, whereas Tlx3-L5IT neurons had extensive axonal density across a much larger region of the striatum, including bilateral innervation by SSp-bfd neurons. Striatal innervation by Scnn1a-L4 neurons was intermediate between Rasgrf2-L2/3 and Tlx3-L5IT neurons.” We think the reviewer’s comment has helped reveal further interesting aspects of our data set, and we thank the reviewer.

      (4) Figure 5 is an example of the type of large sets of data that can be generated with whole brain mapping and registration to the Allen CCF that provides information of questionable value. Ordering the 50 plus structures by the density of labeling does not provide much in terms of relative input to different types of areas. There are multiple subregions for different functional types ( ie, different visual areas and different motor subregions are scattered not grouped together. Makes it difficult to understand any organizing principles.

      We agree with the reviewer, and fully support the importance of considering subregions within the relatively coarse compartmentalization of the current Allen CCF. In order to provide some further information about connectivity that may help give the reader further insights into the data, we have now added further quantification of cortex-specific axonal density ranked according to functional subregions in a new supplementary figure (Figure 5 – figure supplement 2). 

      (5) The GENSAT Cre driver lines used must have the specific line name used, not just the gene name as the GENSAT BAC-Cre lines had multiple lines for each gene and often with very different expression patterns. Rbp4_KL100, Tlx3_PL56, Sim1_KJ18, Ntsr1_ GN220. 

      In the revised manuscript, we now write out a fuller description of the mouse lines the first time they are mentioned in the Results section on P. 7. The full mouse line names, accession numbers and references were of course already described in the methods section, which remains the case in the revised manuscript.

      Reviewer #2 (Public Review): 

      Summary: 

      This study takes advantage of multiple methodological advances to perform layer-specific staining of cortical neurons and tracking of their axons to identify the pattern of their projections. This publication offers a mesoscale view of the projection patterns of neurons in the whisker primary and secondary somatosensory cortex. The authors report that, consistent with the literature, the pattern of projection is highly different across cortical layers and subtype, with targets being located around the whole brain. This was tested across 6 different mouse types that expressed a marker in layer 2/3, layer 4, layer 5 (3 sub-types) and layer 6.  Looking more closely at the projections from primary somatosensory cortex into the primary motor cortex, they found that there was a significant spatial clustering of projections from topographically separated neurons across the primary somatosensory cortex. This was true for neurons with cell bodies located across all tested layers/types. 

      Strengths: 

      This study successfully looks at the relevant scale to study projection patterns, which is the whole brain. This is achieved thanks to an ambitious combination of mouse lines, immunohistochemistry, imaging and image processing, which results in a standardized histological pipeline that processes the whole-brain projection patterns of layer-selected neurons of the primary and secondary somatosensory cortex. 

      This standardization means that comparisons between cell-types projection patterns are possible and that both the large-scale structure of the pattern and the minute details of the intra-areas pattern are available. 

      This reference dataset and the corresponding analysis code are made available to the research community. 

      Weaknesses: 

      One major question raised by this dataset is the risk of missing axons during the postprocessing step. Indeed, it appears that the control and training efforts have focused on the risk of false positives (see Figure 1 supplementary panels). And indeed, the risk of overlooking existing axons in the raw fluorescence data id discussed in the article. 

      Based on the data reported in the article, this is more than a risk. In particular, Figure 2 shows an example Rbp4-L5 mouse where axonal spread seems massive in Hippocampus, while there is no mention of this area in the processed projection data for this mouse line. 

      In Figure 2, we show the expression of tdTomato in double-transgenic mice in which the Cre-driver lines were crossed with a Cre-dependent reporter mouse expressing cytosolic tdTomato. In addition to the specific labelling of L5PT neurons in the somatosensory cortex, Rbp4-Cre mice also express Cre-recombinase in other brain regions including the hippocampus. In the reporter mice crossed with Rbp4-Cre mice, tdTomato is expressed in neurons with cell bodies in the hippocampus which is clearly visualized in Figure 2. Because our axonal labelling is based on localized viral vector expression of tdTomato in SSp-bfd and SSs, the expression of Cre in hippocampus does not affect our analysis. In order to clarify to the reader, in the legend to Figure 2D, we now specifically write “As for panel A, but for Rbp4-L5 neurons. Note strong expression of Cre in neurons with cell bodies located in the hippocampus, which does not affect our analysis of axonal density based on virus injected locally into the neocortex.” Consistent with this observation, the Allen Institute’s ISH data support

      expression of Rbp4 in neurons of the hippocampus e.g. https://mouse.brainmap.org/gene/show/19425 and https://mouse.brainmap.org/experiment/show/68632655.

      Similarily, the Ntsr1-L6CT example shows a striking level of fluorescence in Striatum, that does not reflect in the amount of axons that are detected by the algorithms in the next figures.  These apparent discrepancies may be due to non axonal-specific fluorescence in the samples. In any case, further analysis of such anatomical areas would be useful to consolidate the valuable dataset provided by the article. 

      As pointed out above, Figure 2 shows cytosolic tdTomato fluorescence in transgenic crosses of the Cre-driver mice with Cre-dependent tdTomato reporter mice. For the Ntsr1-Cre x LSL-tdTomato mice, all corticothalamic L6CT neurons from across the entire cortex drive tdTomato expression. The axon of each neuron must traverse the striatum giving rise to fluorescence in the striatum. As discussed above, labelling of synaptic specialisations will be important in future studies to separate travelling axon from innervating axon. However, the overall impact of the axons traversing the striatum is again mitigated in our study by considering the axonal projections from local sparse infections in SSp-bfd and SSs rather than from cortex-wide tdTomato expression.

      Reviewer #3 (Public Review): 

      Summary: 

      The paper offers a systematic and rigorous description of the layer-and sublayer specific outputs of the somatosensory cortex using a modern toolbox for the analysis of brain connectivity which combines: 1) Layer-specific genetic drivers for conditional viral tracing; 2) whole brain analyses of axon tracts using tissue clearing and imaging; 3) Segmentation and quantification of axons with normalization to the number of transduced neurons; 4) registration of connectivity to a widely used anatomical reference atlas; 5) functional validation of the connectivity using optogenetic approaches in vivo. 

      Strengths: 

      Although the connectivity of the somatosensory cortex is already known, precise data are dispersed in different accounts (papers, online resources,) using different methods. So the present account has the merit of condensing this information in one very precisely documented report. It also brings new insights on the connectivity, such as the precise comparison of layer specific outputs, and of the primary and secondary somatosensory areas. It also shows a topographic organization of the circuits linking the somatosensory and motor cortices. The paper also offers a clear description of the methodology and of a rigorous approach to quantitative anatomy. 

      Weaknesses: 

      The weakness relates to the intrinsic limitations of the in toto approaches, that currently lack the precision and resolution allowing to identify single axons, axon branching or synaptic connectivity. These limitations are identified and discussed by the authors. 

      We agree with the reviewer.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      No additional comment 

      OK

      Reviewer #2 (Recommendations For The Authors): 

      In Figure 8, we don't get to see much raw data, while the diversity of functional responses pattern to the primary and supplementary S1 activations is highly intriguing (and this diversity exists as suggested by the results in Figure 8E, LRPT). 

      Can Figure 8C be less blurred? Maybe give more space to individual examples, such as an overlay of the delineations of the activated area across the tested mice? 

      Also, can we have a view on the time dynamics of the functional activation and integration window? 

      Raw data - We have now added a new supplementary figure (Figure 8 – figure supplement 1) to show data from individual mice, as well as plotting the time-course of the evoked jRGECO fluorescence signals in the frontal cortex hotspot. 

      Image blur - Each pixel represents 62.5 x 62.5 um on the cortical surface. The images in Figure 8B&C were averaged across mice, which causes some additional spatial blurring. However, the most likely explanation for the ‘blurred’ impression, is the overall large horizontal extent of the axonal innervation as well as likely rapid lateral spread of excitation both at the stimulation area and in the target region, as for example also indicated in rapid voltage-sensitive imaging experiments (Ferezou et al., 2007).  

      Reviewer #3 (Recommendations For The Authors): 

      At the time being, the abstract is really centred on the methodology which is no longer very novel as it has actually been already been described previously by other groups. In my view the paper would gain visibility, and be a useful tool for the community if amended to better point out the significant results of the study, for instance, i) the layer and sub-layer specificity of the outputs, using the listed genetic drivers; ii) the comparison of primary and secondary somatosensory areas, iii) the functional validation. The layer specificity of each cre- line should be indicated in the abstract. 

      We have tried to improve the writing of the abstract along the lines suggested by the reviewer. Specifically, we have now added layer and projection class of the various Cre-lines, and we now also highlight the most obvious differences in the innervation patterns.

      There is some degree of redundancy in the description in the result section. One suggestion, for an easier flow of reading, would be to join the paragraphs " Laminar characterization of the Cre-lines.." and: "Axonal projections...". Start for each Cre-line with a description of the laminar localisation of recombination in the somatosensory cortices, followed therefrom by the description of outputs from SSp-bfd and SSs; Then the general description/overview of the outputs can be summarized as a legend to Figure 5-supplementary 2, which could appear as a main figure. 

      Although we agree with the reviewer that there is some level of redundancy in the text, the results of the characterization of the Cre-line (Figure 2) is quite a different experiment compared to the viral injections described in other figures, and we therefore prefer to keep these sections separate.

      Other minor points: 

      In the text; Indicate the genetic background of the transgenic mouse lines. 

      On P. 18, we now indicate that all mice were “back-crossed with C57BL/6 mice”.

      Keep consistency in the designation of the areas, S1 appears sometimes as SSp-bfd or as SSp 

      We thank the reviewer for pointing out the inconsistent nomenclature, which we have now corrected in the revised manuscript. ‘SSp’ remains used on P. 9 and P. 16 of the revised manuscript to indicate a region including SSp-bfd but also extending beyond.

      Figure 1 supplement 2 is not really necessary to show (as the viral tools have previously been validated) can just be stated in the text. Conversely one would like to see a higher resolution image of the injection sites that allowed to do the cell counts used for normalization, as this can be pretty tricky. 

      In response to the reviewer’s suggestion, we have now added a new supplemental figure to show an example of how cells in the injection site were counted (Figure 1 – figure supplement 3).

      Figure 2: the most important here is the higher magnification to show the precise laminar localisation of the recombination, rather than the atlas landmarks that is already shown in Figure 1. This would allow more space for clearer higher magnification panels comparing SSs and SSp. The present image hints to some real differences, but difficult to appreciate with the current resolution. The legend should also comment on the labelling seen in layer 1, in the Tlx2 and Rbp4 lines. Could be dendritic labelling, but this needs a word of clarification.

      We think both the overview images as well as the high-resolution images are of value to the reader. Following the reviewer’s comment, in the legends to Figure 2C&D, we have now added text suggesting that the layer 1 fluorescence is likely axonal or dendritic in origin : “Labelling in layer 1 is likely of axonal or dendritic origin, and no cell bodies were labelled in this layer.” In addition, we have added a new supplemental figure which shows the cortical labelling in SSp and SSS in a more magnified view (Figure 2 – figure supplement 1).

      Figure 3: the comparison of the 3 transgenic lines labelling layer 5 and showing sublaminar identities is really interesting in showing the heterogeneity of this layer and possible regional differences. However, the cases shown for illustration for Rbp4 and Tlx3 seem pretty massive in comparison with the other drivers. Maybe cases with smaller injections could be chosen for illustration. 

      Figure 3 shows grand average axonal density maps across different mice normalized to the number of neurons in the injection site. The large amount of axon per neuron observed in Rbp4 and Tlx3 mice therefore shows their long, wide-ranging axons compared to other neuronal classes.

      Figure 6A could be a supplementary figure in my view; 6B is clearer. 

      We think both representations are useful, and we think different readers might better appreciate either of the two analyses.

    1. eLife assessment

      The authors presented a valuable bioinformatics pipeline for screening and identifying inhibitory receptors for potential drug targets. They provided solid evidence showing a sequential reduction in the search space through various screening tools and algorithms and demonstrated that this pipeline can be used to "rediscover" known targets. Further experimental validation on putative and unknown inhibitory receptors will strengthen the evidence reported in this work. This study will be of interest to bioinformaticians and computational biologists working on immune regulation, sequence screening, and target identification of immune checkpoint inhibitors.

    2. Reviewer #2 (Public review):

      Summary:

      The authors developed a bioinformatic pipeline to aid the screening and identification of inhibitory receptors suitable as drug targets. The challenge lies in the large search space and lack of tools for assessing the likelihood of their inhibitory function. To make progress, the authors used a consensus protein membrane topology and sequence motif prediction tool (TOPCOS) combined with both a statistical measure assessing their likelihood function and a machine learning protein structural prediction model (AlphaFold) to greatly cut down the search space. After obtaining a manageable set of 398 high confidence known and putative inhibitory receptors through this pipeline, the authors then mapped these receptors to different functional categories across different cell types based on their expression both in the resting and activated state. Additionally, by using publicly available pan cancer scRNA-seq for tumor-infiltrating T cells data, they showed that these receptors are expressed across various cellular subsets.

      Strengths:

      The authors presented sound arguments motivating the need to efficiently screen inhibitory receptors and to identify those that are functional. Key components of the algorithm were presented along with solid justification for why they addressed challenges faced by existing approaches. To name a few:

      • TOPCON algorithm was elected to optimize the prediction of membrane topology<br /> • A statistical measure was used to remove potential false positives<br /> • AlphaFold is used to filter out putative receptors that are low confidence (and likely intrinsically disordered)

      To examine receptors screened through this pipeline through a functional lens, the authors proposed to look at their expression of various immune cell subsets to assign functional categories. This is a reasonable and appropriate first step for interpreting and understanding how potential drug targets are differentially expressed in some disease contexts. They also presented an example showing this pipeline can be used to "rediscover" known targets.

      Weaknesses:

      The paper has strength in the pipeline they presented, but the weakness, in my opinion, lies in the lack of direct experimental validation on putative receptors. That said, the authors presented in the revised manuscript, as a proof-of-concept, an analytic approach for using functional categorization of putative inhibitory receptors to select therapeutic targets based on in vitro RNAseq. Such analysis will benefit from further investigation across different cancer types using in vivo expression.

    3. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work is potentially useful because it has generated a mineable yield of new candidate immune inhibitory receptors, which can serve both as drug targets and as subjects for further biological investigation. It is noted however that the argument of the work is rather incomplete, in that it does very little to validate the putative new receptors, and merely makes a study of their putative distribution across cell types. Experimental follow-up to demonstrate the claimed properties for the proteins identified, or mining existing experimental data sources on gene expression across tissues to at least show that the pipeline correctly identified genes likely to be specific to immune cells (or something along these lines), would make this work more complete and compelling. 

      We thank the editors for their critical reading and assessment of our manuscript. We acknowledge that the present study is limited by a lack of experimental follow-up. However, we purposely chose to make this pipeline of putative novel inhibitory receptors public at this early stage for our work to be a starting point for further functional investigation of these targets by the scientific community.   

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript proposes a new bioinformatics approach identifying several hundreds of previously unknown inhibitory immunoreceptors. When expressed in immune cells (such as neutrophils, monocytes, CD8+, CD4+, and T-cells), such receptors inhibit the functional activity of these cells. Blocking inhibitory receptors represents a promising therapeutic strategy for cancer treatment.

      As such, this is a high-quality and important bioinformatics study. One general concern is the absence of direct experimental validation of the results. In addition to the fact that the authors bioinformatically identified 51 known receptors, providing such experimental evaluation (of at least one, or better few identified receptors) would, in my opinion, significantly strengthen the presented evidence.

      I will now briefly summarize the results and give my comments.

      First, using sequence comparison analysis, the authors identify a large set of putative receptors based on the presence of immunoreceptor tyrosine-based inhibitory motifs (ITIMs), or immunoreceptor tyrosinebased switch motifs (ITSMs). They further filter the identified set of receptors for the presence of the ITIMs or ITSMs in an intracellular domain of the protein. Second, using AlphaFold structure modeling, the authors select only receptors containing ITIMs/ITSMs in structurally disordered regions. Third, the evaluation of gene expression profiles of known and putative receptors in several immune cell types was performed. Fourth, the authors classified putative receptors into functional categories, such as negative feedback receptors, threshold receptors, threshold disinhibition, and threshold-negative feedback. The latter classification was based on the available data from Nat Rev Immunol 2020. Fifth, using publicly available single-cell RNA sequencing data of tumor-infiltrating CD4+ and CD8+ cells from nearly twenty types of cancer, the authors demonstrate that a significant fraction of putative receptors are indeed expressed in these datasets.

      In summary, in my opinion, this is an interesting, important, high-quality bioinformatics work. The manuscript is clearly written and all technical details are carefully explained.

      One comment/suggestion regarding the methodology of evaluating gene expression profiles of putative receptors: perhaps it might be important to look at clusters of genes that are co-expressed with putative inhibitory receptors. 

      We thank the reviewer for their comments and suggestions.  We acknowledge that looking at co-expressed genes and subsequently at gene ontology enrichment could be an interesting approach to prioritize the inhibitory receptors. However, since there are many ways to approach the results of the gene coexpression networks, which also depend on the cell type and activation status of interest, we have chosen to discuss the implications of these networks in the discussion with the following paragraph, rather than reporting all these different approaches in the paper:

      “To further prioritize inhibitory receptors in immune cell subsets or diseases of interest, gene coexpression networks of putative inhibitory receptors could be assessed. On the one hand, the cooccurrence of putative inhibitory receptors with known inhibitory receptors within a module could be one approach, while on the other hand the presence of putative inhibitory receptors in a different module could suggest novel regulation of different biological functions than the known receptors. The location of the putative inhibitory receptors in the network could also change depending on the cell type and the activation status of the cell. Additionally, one could look at the co-expression of candidates with other genes within a gene module to look at potential biological function, and at co-expression with signalling molecules known to interact with inhibitory receptors, such as Csk, SHP-1, SHP-2 and SHIP1, although their regulation might be more post-translationally regulated rather than at mRNA level.”

      Reviewer #2 (Public Review):

      Summary:

      The authors developed a bioinformatic pipeline to aid the screening and identification of inhibitory receptors suitable as drug targets. The challenge lies in the large search space and lack of tools for assessing the likelihood of their inhibitory function. To make progress, the authors used a consensus protein membrane topology and sequence motif prediction tool (TOPCOS) combined with both a statistical measure assessing their likelihood function and a machine learning protein structural prediction model (AlphaFold) to greatly cut down the search space. After obtaining a manageable set of 398 high-confidence known and putative inhibitory receptors through this pipeline, the authors then mapped these receptors to different functional categories across different cell types based on their expression both in the resting and activated state. Additionally, by using publicly available pan-cancer scRNA-seq for tumor-infiltrating T-cell data, they showed that these receptors are expressed across various cellular subsets.

      Strengths:

      The authors presented sound arguments motivating the need to efficiently screen inhibitory receptors and to identify those that are functional. Key components of the algorithm were presented along with solid justification for why they addressed challenges faced by existing approaches. To name a few:

      • TOPCON algorithm was elected to optimize the prediction of membrane topology.

      • A statistical measure was used to remove potential false positives.

      • AlphaFold is used to filter out putative receptors that are low confidence (and likely intrinsically disordered).

      To examine receptors screened through this pipeline through a functional lens, the authors proposed to look at their expression of various immune cell subsets to assign functional categories. This is a reasonable and appropriate first step for interpreting and understanding how potential drug targets are differentially expressed in some disease contexts.

      Weaknesses:

      The paper has strength in the pipeline they presented, but the weakness, in my opinion, lies in the lack of concrete demonstration on how this pipeline can be used to at least "rediscover" known targets in a

      disease-specific manner. For example, the result that both known and putative immune inhibitory receptors are expressed across a wide variety of tumor-infiltrating T-cell subsets is reassuring, but this would have been more informative and illustrative if the authors could demonstrate using a disease with known targets, as opposed to a pan-cancer context. Additionally, a discussion that contrasts the known and putative receptors in the context above would help readers better identify use cases suitable for their research using this pipeline. Particularly,

      • For known receptors, does the pipeline and the expression analysis above rediscover the known target in the disease of interest?

      • For putative receptors, what do the functional category mapping and the differential expression across various tumor-infiltrating T-cell subsets imply on a potential therapeutic target?

      We thank the reviewer for their assessment and comments. The primary purpose of the bioinformatics pipeline was to identify putative inhibitory receptors in a disease-agnostic manner and allow the scientific community to further explore targets in their specific diseases of interest. We performed our pan-cancer expression analysis as a preliminary proof of concept and agree that exploring targets in specific diseases, cancer or otherwise, could be more informative. To validate that we rediscovered known immunotherapeutic targets, we analyzed the expression of known inhibitory receptors on tumorinfiltrating T cells of melanoma patients using the same dataset as figure 3. We find high expression of known therapeutic targets, such as PD-1, in addition to other known inhibitory receptors that are being targeted in clinical trials, one of which being TIGIT. We have added this information to the results section and added the corresponding graph as supplementary figure 5. 

      For the putative inhibitory receptors, we believe the functional categorization can assist in selecting targets that are more likely to be successful in a therapeutic context. As we previously proposed in our perspective on functional categorization of inhibitory receptors (Rumpret et al., Nat Imm, 2020), it might be beneficial to target inhibitory receptors of different functional categories in cancer immunotherapy. Targeting a threshold receptor to lower the threshold for activation and a negative feedback receptor to lengthen and strengthen the cellular response might therefore be more effective than targeting two receptors of a single functional category. Even though we realize RNA sequencing data of in vitro stimulated immune cells is not identical to data from TILs, we have tried to characterize the functional categories expressed by TILs by extrapolating the defined functional categorization per gene from figure 2, and added the corresponding graphs as supplementary figure 4. This shows that mainly threshold receptors and some (threshold-)negative feedback receptors are expressed by the different T cell subsets, which would open the possibility of using the proposed therapeutic strategy of targeting different functional categories. However, we acknowledge that this will require further validation of expression patterns in vivo in different cancers and immune cell subsets. 

      Reviewer #1 (Recommendations For The Authors):

      One comment/suggestion regarding the methodology of evaluating gene expression profiles of putative receptors: perhaps it might be important to look at clusters of genes that are co-expressed with putative inhibitory receptors.

      See our reply to the suggestion above.

      Reviewer #2 (Recommendations For The Authors):

      Results section

      (a) "Putative ITIM/ITSM-bearing immune inhibitory receptors can be found in the human genome"

      i. Figure 1 could benefit from additional labeling. For example, in B, the grey line indicates 5%, etc. Additionally, in panel B&C, I assume by "predicted" the author meant using TOPCONS?

      ii. Figure 1B doesn't seem to be consistent with this sentence "However, for 10 out of 51, we observed ITIM/ITSM sequences in the permutated sequence up to ~25% of the time" [page 2, line 1-3], as all 51 data points in Figure 1B (under "Known" panel) are below the 0.25 horizontal line?

      i. We have adjusted the figure legend to better indicate the information provided in the figures. The predicted genes are all unknown transmembrane candidates that contain an ITIM or ITSM in their intracellular domain, as determined using TOPCONS.

      ii. Due to the nature of permutation testing, there is some variation in the individual likelihood values for each protein sequence. However, as they were generally below 0.25 in any given iteration, we decided to define this value as a threshold for inclusion. 

      (b) "AlphaFold structure predictions can assist in identifying likely functional ITIM/ITSMs"

      i. Readability would increase if the author indicate how pLDDT score is computed and in what range is it (between 0 and 100.)

      ii. Third paragraph. Can the author comment on why 80 pLDDT is chosen as the cutoff? The first sentence of this paragraph states "We found that 99 out of 101 ITIM/ITSMs of the 51 known receptors had low confidence score, i.e., less than 80 pLDDT, with an average confidence score of 49.3 pLDDT..." However, it was later stated in the Discussion, page 10, starting Line 11 "We determined a threshold of 80 pLDDT based on the average prediction scores of the ITIM/ITSMs in known inhibitory receptors....". If 99 out of 101 ITIM/ITSMs had pLDDT<80, then it seems strange that the average of the 101 is at 80pLDDT, even in the extreme where the remaining 101-99=2 ITIM/ITSMs attain the maximum pLDDT score at 100, unless the distribution of those 99 is narrowly centered around 80? A distribution of the pLDDT would help clarify.

      i. The pLDDT scores are computed by AlphaFold as a way to determine how well a specific residue and/or region is expected to be modelled in three-dimensional space. We now refer to the corresponding AlphaFold publications and references therein to clarify this (10.1093/nar/gkab1061, 10.1038/s41586021-03819-2, 10.1093/bioinformatics/btt473). We also have now included the range (i.e., 0-100) in the text.

      ii. The threshold of 80 pLDDT was chosen as this still encompasses all known inhibitory receptors and was not calculated based on an average of the prediction scores. In this way, we still included ITIM/ITSMs with a relatively high pLDDT, such as those observed in PD-1 and LAIR-1. The previous text ‘average prediction scores of the ITIM/ITSMs in known inhibitory receptors’ referred to the averaging of the confidence score for each of the six amino acids encompassing the ITIM/ITSM into one overall score per ITIM/ITSM. We have adjusted the text to better reflect this.

      (c) "Putative inhibitory receptors are expressed across immune cell subsets"

      Figure S2, the last sentence in the caption (relevant for panel C) states "Cell subsets without uniquely expressed putative inhibitory receptors i.e., B cells and T cell, are excluded from the panel for clarity", but B cells and T cells are present in panel C?

      Indeed, but they are only included for the cases where the cell subsets share receptor expression with other immune cell subsets. The B and T cells do not express any unique putative multi-spanning receptors, all receptors are shared with at least one other immune cell subset. 

      (d) "Known and putative inhibitory receptors are expressed on tumour infiltrating T cells"

      i. Missing panel C label in Figure 3 and S3.

      ii. By comparing Figure 3 and S3, it looks to me that there's not a big difference between single-spanning and multi-spanning inhibitory receptors. I wonder if the authors can comment or speculate on this similarity in addition to differences of expression among T-cell subsets. Would the similarities and differences above be explained by cancer type?

      i. Figure 3 and S3 do not contain a panel C, but panel B consists of a lower (CD8+) and an upper (CD4+) subpanel, we have more clearly indicated this in the figure legend in the revised manuscript. 

      ii. While some T cell subsets, such as exhausted CD8+ T cells and CD4+ regulatory T cells, appear to not differ much in their expression of either single- or multi-spanning receptors, we do observe that, for example, effector memory CD4+ T cells or EMRA CD8+ T cells express single-spanning inhibitory receptors to a higher extent than multi-spanning inhibitory receptors. It is possible that these differences and similarities reflect some of the roles multi-spanning inhibitory receptors could play in regulating immune cells, for example in response to chemokines, as many chemokine receptors are multi-spanning proteins. 

      Data and Code availability

      Although the Methods section provides some context for the computational analysis and citations for relevant data, software availability and a data availability statement are lacking.

      We have included a data availability statement to the data files and code in the revised manuscript.

    1. eLife assessment

      This important study investigates the intracellular localization patterns of G proteins involved in GPCR signaling, presenting compelling evidence for their preference for plasma and lysosomal membranes over endosomal, endoplasmic reticulum, and Golgi membranes. This discovery has significant implications for understanding GPCR action and signaling from intracellular locations. This research will interest cell biologists studying protein trafficking and pharmacologists exploring localized signaling phenomena.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Jang et al. describes the application of new methods to measure the localization GTP-binding signaling proteins (G proteins) on different membrane structures in a model mammalian cell line (HEK293). G proteins mediate signaling by receptors found at the cell surface (GPCRs), with evidence from the last 15 years suggesting that GPCRs can induce G-protein mediated signaling from different membrane structures within the cell, with variation in signal localization leading to different cellular outcomes. While it has been clearly shown that different GPCRs efficiently traffic to various intracellular compartments, it is less clear whether G proteins traffic in the same manor, and whether GPCR trafficking facilitates "passenger" G protein trafficking. This question was a blind spot in the burgeoning field of GPCR localized signaling in need of careful study, and the results obtained will serve as an important guide post for further work in this field.<br /> The extent to which G proteins localize to different membranes within the cell is the main experimental question tested in this manuscript. This question is pursued by through two distinct methods, both relying on genetic modification of the G-beta subunit with a tag. In one method, G-beta is modified with a small fragment of the fluorescent protein mNG, which combines with the larger mNG fragment to form a fully functional fluorescent protein to facilitate protein trafficking by fluorescent microscopy. This approach was combined with expression of fluorescent proteins directed to various intracellular compartments (different types of endosomes, lysosome, endoplasmic reticulum, golgi, mitochondria) to look for colocalization of G-beta with these markers. These experiments showed compelling evidence that G-beta co-localizes with markers at the plasma membrane and the lysosome, with weak or absent co-localization for other markers. A second method for measuring localization relied on fusing G-beta with a small fragment from a miniature luciferase (HiBit) that combines with a larger luciferase fragment (LgBit) to form an active luciferase enzyme. Localization of G-beta (and luciferase signal) was measured using a method known as bystander BRET, which relies on expression of a fluorescent protein BRET acceptor in different cellular compartments. Results using bystander BRET supported findings from fluorescence microscopy experiments. These methods for tracking G protein localization were also used to probe other questions. The activation of GPCRs from different classes had virtually no impact on the localization of G-beta, suggesting that GPCR activation does not result in shuttling of G proteins through the endosomal pathway with activated receptors.

      In the revised version of this manuscript the authors have performed informative and important new experiments in addition to adding new text to address conceptual questions. These new data and discussions are commendable and address most or all of the weaknesses listed in the initial review.

      Strengths:

      The question probed in this study is quite important and, in my opinion, understudied by the pharmacology community. The results presented here are an important call to be cognizant of the localization of GPCR coupling partners in different cellular compartments. Abundant reports of endosomal GPCR signaling need to consider how the impact of lower G protein abundance on endosomal membranes will affect the signaling responses under study.

      *The work presented is carefully executed, with seemingly high levels of technical rigor. These studies benefit from probing the experimental questions at hand using two different methods of measurement (fluorescent microscopy and bystander BRET). The observation that both methods arrive at the same (or a very similar) answer inspires confidence about the validity of these findings.

      Weaknesses:

      *As noted by the authors, they do not demonstrate that the tagged G-beta is predominantly found within heterotrimeric G protein complexes. In the revised manuscript the authors have added new discussion text on why it is likely that G-beta is mostly found in complexes. This line of reasoning is convincing, although more robust experimental methods for assessing the assembly status of G-beta could be a valuable target for future experimental developments.

    3. Reviewer #2 (Public review):

      This study assess the subcellular distribution of a major G protein subunit (Gβ1) when expressed at an endogenous level in a well-studied model cell system (293 cells). The approach elegantly extends a gene editing strategy described by Leonetti's group and combines it with a FRET-based proximity assay to detect the presence of endogenously tagged Gβ1 on membrane compartments of 293 cells. The authors achieve their goal, and the data are convincing and interesting. The authors do a nice job of integrating their results with previous work in the field. The methods are now sufficiently well-described to enable other investigators to apply or adapt them in future studies.

    4. Reviewer #3 (Public review):

      Summary:

      This article addresses an important and interesting question concerning intracellular localization and dynamics of endogenous G proteins. The fate and trafficking of G protein-coupled receptors (GPCRs) have been extensively studied but so far little is known about the trafficking routes of their partner G proteins that are known to dissociate from their respective receptors upon activation of the signaling pathway. Authors utilize modern cell biology tools including genome editing and bystander bioluminescence resonance energy transfer (BRET) to probe intracellular localization of G proteins in various membrane compartments in steady state and also upon receptor activation. Data presented in this manuscript shows that while G proteins are mostly present on the plasma membrane, they can be also detected in endosomal compartments, especially in late endosomes and lysosomes. This distribution, according to data presented in this study, seems not to be affected by receptor activation. These findings will have implications in further studies addressing GPCR signaling mechanisms from intracellular compartments.

      Strengths:

      The methods used in this study are adequate for the question asked. Especially use of genome-edited cells (for addition of the tag on one of the G proteins) is a great choice to prevent effects of overexpression. Moreover, use of bystander BRET allowed authors to probe intracellular localization of G proteins in a very high-throughput fashion. By combining imaging and BRET authors convincingly show that G proteins are very low abundant on early endosomes (also ER, mitochondria, and medial Golgi), however seem to accumulate on membranes of late endosomal compartments. Moreover, authors also looked at the dynamics of G protein trafficking by tracking them over multiple time points in different compartments.

      Weaknesses:

      While authors provide a novel dataset, many questions regarding G protein trafficking remain open. For example, it is not entirely clear which pathway is utilized to traffic G proteins from the plasma membrane to intracellular compartments. Additionally, future studies should also include more quantitative details considering G-protein distribution in different compartments as well as more detailed dynamic data on G protein internalization as well as intracellular trafficking kinetics.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Jang et al. describes the application of new methods to measure the localization of GTP-binding signaling proteins (G proteins) on different membrane structures in a model mammalian cell line (HEK293). G proteins mediate signaling by receptors found at the cell surface (GPCRs), with evidence from the last 15 years suggesting that GPCRs can induce G-protein mediated signaling from different membrane structures within the cell, with variation in signal localization leading to different cellular outcomes. While it has been clearly shown that different GPCRs efficiently traffic to various intracellular compartments, it is less clear whether G proteins traffic in the same manner, and whether GPCR trafficking facilitates "passenger" G protein trafficking. This question was a blind spot in the burgeoning field of GPCR localized signaling in need of careful study, and the results obtained will serve as an important guidepost for further work in this field. The extent to which G proteins localize to different membranes within the cell is the main experimental question tested in this manuscript. This question is pursued through two distinct methods, both relying on genetic modification of the G-beta subunit with a tag. In one method, G-beta is modified with a small fragment of the fluorescent protein mNG, which combines with the larger mNG fragment to form a fully functional fluorescent protein to facilitate protein trafficking by fluorescent microscopy. This approach was combined with the expression of fluorescent proteins directed to various intracellular compartments (different types of endosomes, lysosome, endoplasmic reticulum, Golgi, mitochondria) to look for colocalization of G-beta with these markers. These experiments showed compelling evidence that G-beta co-localizes with markers at the plasma membrane and the lysosome, with weak or absent co-localization for other markers. A second method for measuring localization relied on fusing G-beta with a small fragment from a miniature luciferase (HiBit) that combines with a larger luciferase fragment (LgBit) to form an active luciferase enzyme. Localization of Gbeta (and luciferase signal) was measured using a method known as bystander BRET, which relies on the expression of a fluorescent protein BRET acceptor in different cellular compartments. Results using bystander BRET supported findings from fluorescence microscopy experiments. These methods for tracking G protein localization were also used to probe other questions. The activation of GPCRs from different classes had virtually no impact on the localization of G-beta, suggesting that GPCR activation does not result in the shuttling of G proteins through the endosomal pathway with activated receptors.

      Strengths:

      The question probed in this study is quite important and, in my opinion, understudied by the pharmacology community. The results presented here are an important call to be cognizant of the localization of GPCR coupling partners in different cellular compartments. Abundant reports of endosomal GPCR signaling need to consider how the impact of lower G protein abundance on endosomal membranes will affect the signaling responses under study.

      The work presented is carefully executed, with seemingly high levels of technical rigor. These studies benefit from probing the experimental questions at hand using two different methods of measurement (fluorescent microscopy and bystander BRET). The observation that both methods arrive at the same (or a very similar) answer inspires confidence about the validity of these findings.

      Weaknesses:

      The rationale for fusing G-beta with either mNG2(11) or SmBit could benefit from some expansion. I understand the speculation that using the smallest tag possible may have the smallest impact on protein performance and localization, but plenty of researchers have fused proteins with whole fluorescent proteins to provide conclusions that have been confirmed by other methods. Many studies even use G proteins fused with fluorescent proteins or luciferases. Is there an important advantage to tagging G-beta with small tags? Is there evidence that G proteins with full-size protein tags behave aberrantly? If the studies presented here would not have been possible without these CRISPR-based tagging approaches, it would be helpful to provide more context to make this clearer. Perhaps one factor would be interference from newly synthesized G proteins-fluorescent protein fusions en route to the plasma membrane (in the ER and Golgi).

      There are several advantages to using small peptide tags that we did not fully explain. From a practical standpoint the most important advantage of using the HiBit tag instead of full-length Nanoluc is that it allows us to restrict luminescence output to cells transiently transfected with LgBit. In this way untransfected cells contribute no background signal. Although we did not take advantage of it here, this also applies to fluorescent protein complementation, and will be useful for visualizing proteins in individual cells within tissues. The HiBit tag also allows PAGE analysis by probing membranes with LgBit (as in Fig. 1). We are not aware of evidence that tagging Gb or Gg subunits on the N terminus results in aberrant behavior, while there is some evidence that Ga subunits tagged with full-size protein tags (in some positions) have altered functional properties (PMID: 16371464). We do think that editing endogenous genes is critical, as studies using transient overexpression (usually driven by strong promoters) have sometimes reported accumulation of tagged G proteins in the biosynthetic pathway (e.g., PMID: 17576765), as the reviewer suggests. Ga and Gbg appear to be mutually dependent on each other for appropriate trafficking to the plasma membrane (reviewed in PMID: 23161140), therefore the native (presumably matched) stoichiometry is likely to be critical.

      To clarify this context the revised manuscript includes the following:

      “For bioluminescence experiments we added the HiBit tag (Schwinn et al., 2018) and isolated clonal “HiBit-b1“ cell lines. An advantage of this approach over adding a full-length Nanoluc luciferase is that it requires coexpression of LgBit to produce a complemented luciferase. This limits luminescence to cotransfected cells and thus eliminates background from untransfected cells.”

      “Some studies using overexpressed G protein subunits have suggested that a large pool of G proteins is located on intracellular membranes, including the Golgi apparatus (Chisari et al., 2007; Saini et al., 2007; Tsutsumi et al., 2009), whereas others have indicated a distribution that is dominated by the plasma membrane (Crouthamel et al., 2008; Evanko, Thiyagarajan, & Wedegaertner, 2000; Marrari et al., 2007; Takida & Wedegaertner, 2003). A likely factor contributing to these discrepant results is the stoichiometry of overexpressed subunits, as neither Ga nor Gbg traffic appropriately to the plasma membrane as free subunits (Wedegaertner, 2012). Our gene-editing approach presumably maintains the native subunit stoichiometry, providing a more accurate representation of native G protein distribution.”

      As noted by the authors, they do not demonstrate that the tagged G-beta is predominantly found within heterotrimeric G protein complexes. If there is substantial free G-beta, then many of the conclusions need to be reconsidered. Perhaps a comparison of immunoprecipitated tagged G beta vs immunoprecipitated supernatant, with blotting for other G protein subunits would be informative.

      We do think that HiBit-b1 exists predominantly within heterotrimeric complexes, for several reasons. First, overexpression studies have shown that Gbg requires association with Ga to traffic to the plasma membrane, and that by itself Gbg is retained on the endoplasmic reticulum

      (PMID: 12609996; PMID: 12221133). We find almost no endogenous Gb1 on the endoplasmic reticulum, and a high density on the plasma membrane. Second, we are able to detect large increases in free HiBit-Gbg after G protein activation using free Gbg sensors (e.g. Fig. 1). Third, many proteins that bind to free Gbg are found entirely in the cytosol of HEK 293 cells (e.g. PMID: 10066824), suggesting there is not a large population of free Gbg. We have added discussion of these points to the revised manuscript as follows:

      “Endogenous Ga and Gb subunits are expressed at approximately a 1:1 ratio, and Gb subunits are tightly associated with Gg and inactive Ga subunits (Cho et al., 2022; Gilman, 1987; Krumins & Gilman, 2006). Moreover, proteins that bind to free Gbg dimers are found in the cytosol of unstimulated HEK 293 cells, suggesting at most only a small population of free Gbg in these cells. Therefore, we assume that the large majority of mNG-b1 and HiBit-b1 subunits in unstimulated cells are part of heterotrimers.”

      “Notably, when Gbg dimers are expressed alone they accumulate on the endoplasmic reticulum

      (Michaelson et al., 2002; Takida & Wedegaertner, 2003). That we detect almost no endogenous Gbg on the endoplasmic reticulum supports our conclusion that the large majority of Gbg in unstimulated HEK 293 cells is associated with Ga, although we cannot rule out a small population of free Gbg.”

      We do not entirely understand the suggested experiment, as free Gbg will still be largely associated with the membrane fraction. Notably, we find almost no HiBit-b1 in the supernatant after lysis in hypotonic buffer and preparation of membrane fractions, and the small amount that we do find does not change if Ga is overexpressed.

      Additional context and questions:

      (1) There exists some evidence that certain GPCRs can form enduring complexes with G-betagamma (PubMed: 23297229, 27499021). That would seem to offer a mechanism that would enable receptor-mediated transport of G protein subunits. It would be helpful for the authors to place the findings of this manuscript in the context of these previous findings since they seem somewhat contradictory.

      We agree. In our original submission we noted “It is possible that other receptors will influence G protein distribution using mechanisms not shared by the receptors we studied.” In the revised manuscript we have added:

      “For example, a few receptors are thought to form relatively stable complexes with Gbg, which could provide a mechanism of trafficking to endosomes (Thomsen et al., 2016; Wehbi et al., 2013).”

      (2) There is some evidence that GaS undergoes measurable dissociation from the plasma membrane upon activation (see the mechanism of the assay in PubMed: 35302493). It seems possible that G-alpha (and in particular GaS) might behave differently than the G-beta subunit studied here. This is not entirely clear from the discussion as it now stands.

      Indeed, there is abundant evidence that some Gas translocates away from the plasma membrane upon activation. We referred to translocation of “some Ga subunits” in the introduction, although we did not specify that Gas is by far the most studied example. In a previous study (PMID: 27528603) we found that overexpressed Gas samples many intracellular membranes upon activation and returns to the plasma membrane when activation ceases. This is similar to activation-dependent translocation of free Gbg dimers. Because these translocation mechanisms depend on activation and are reversible they are unlikely to be a major source of inactive heterotrimers for intracellular membranes.

      We did a poor job of making it clear that we intentionally avoided translocation mechanisms that operate only during receptor and G protein stimulation. In the revised manuscript we have added new data showing reversible activation-dependent translocation of endogenous HiBitGb1.

      (3) The authors say "The presence of mNG-b1 on late endosomes suggested that some G proteins may be degraded by lysosomes". The mechanism of lysosomal degradation by proteins on the outside of the lysosome is not clear. It would be helpful for the authors to clarify.

      We agree we didn’t connect the dots here. Our initial idea was that G proteins on the surface of late endosomes might reach the interior of late endosomes and then lysosomes by involution into multivesicular bodies. However, the reviewer correctly points out that much of the G protein associated with lysosomes still appears to be on the cytosolic surface, where it would not be subject to degradation. In fact, since lysosomes can fuse with the plasma membrane under certain circumstances, this could even represent a pathway for recycling G proteins to the plasma membrane.

      We have revised the text to avoid giving the impression that lysosomes degrade G proteins, since we have scant evidence that this occurs. In the revised discussion we point out that we do not know the fate of G proteins located on the surface of lysosomes and speculate that these could be returned to the plasma membrane:

      “We do not know the fate of G proteins located on the surface of lysosomes. Since lysosomes may fuse with the plasma membrane under certain circumstances (Xu & Ren, 2015), it is possible that this represents a route of G protein recycling to the plasma membrane.”

      (4) Although the authors do a good job of assessing G protein dilution in endosomal membranes, it is unclear how this behavior compares to the measurement of other lipidanchored proteins using the same approach. Is the dilution of G proteins what we would expect for any lipid-anchored protein at the inner leaflet of the plasma membrane?

      This is a great question. To begin to address it we have studied a model lipid-anchored protein consisting of mNeongreen2 anchored to the plasma membrane by the C terminus of HRas, which is palmitoylated and prenylated. We find that this protein is also diluted on endocytic vesicles, although to a lesser degree than heterotrimeric G proteins. We have added a section to the results and a new figure supplement describing these results:

      “To test if other peripheral membrane proteins are similarly depleted from endocytic vesicles, we performed analogous experiments by overexpressing mNG bearing the C-terminal membrane anchor of HRas (mNG-HRas ct). We found that mNG-HRas ct was also less abundant on FM464-positive endocytic vesicles than expected based on plasma membrane abundance, although not to the same extent as mNG-b1 (Figure 4 - figure supplement 2); mNG-HRas ct density on FM4-64-positive vesicles was 64 ± 17% (mean ± 95% CI; n=78) of the nearby plasma membrane.”

      Reviewer #2 (Public Review):

      This is an interesting method that addresses the important problem of assessing G protein localization at endogenous levels. The data are generally convincing.

      Specific comments

      Methods:

      The description of the gene editing method is unclear. There are two different CRISPR cell lines made in two different cell backgrounds. The methods should clearly state which CRISPR guides were used on which cell line. It is also not clear why HiBit is included in the mNG-β1 construct. Presumably, this is not critical but it would be helpful to explicitly note. In general, the Methods could be more complete.

      We have added the following to the methods to clarify that the same gRNA was used to produce both cell lines:

      “The human GNB1 gene was targeted at a site corresponding to the N-terminus of the Gb1 protein; the sequence 5’-TGAGTGAGCTTGACCAGTTA-3’ was incorporated into the crRNA, and the same gRNA was used to produce both HiBit-b1 and mNG-b1 cell lines.”

      We have added the following to the methods to clarify why HiBit is included in the mNG-b1 construct:

      “HiBit was included in the repair template for producing mNG-b1 cells to enable screening for edited clones using luminescence.”

      Results:

      The explanation of validation experiments in Figures 1 C and D is incomplete and difficult to follow. The rationale and explanation of the experiments could be expanded. In addition, because this is an interesting method, it would be helpful to know if the endogenous editing affects normal GPCR signaling. For example, the authors could include data showing an Isoinduced cAMP response. This is not critical to the present interpretation but is relevant as a general point regarding the method. Also, it may be relevant to the interpretation of receptor effects on G protein localization.

      We have expanded the rationale and explanation of experiments in Figures 1C and D by adding:

      “For example, we observed agonist-induced BRET between the D2 dopamine receptor and mNG-b1, an interaction that requires association with endogenous Ga subunits (Figure 1C). Similarly, we observed BRET between HiBit-b1 and the free Gbg sensor memGRKct-Venus after activation of receptors that couple Gi/o, Gs, and Gq heterotrimers, indicating that HiBit-b1 associated with endogenous Ga subunits from these three families (Figure 1D).”

      We have done the suggested cAMP experiment and provide the data in a new figure supplement:

      “We also found that cyclic AMP accumulation in response to stimulation of endogenous b adrenergic receptors was similar in edited cell lines and their unedited parent lines (Figure 1 - figure supplement 1).”

      Discussion:

      The conclusion that beta-gamma subunits do not redistribute after GPCR activation seems new and different from previous reports. Is this correct? Can the authors elaborate on how the results compare to previous literature?

      Many previous studies have indeed shown that free Gbg dimers can redistribute after GPCR activation and sample intracellular membranes. Our initial focus was on possible changes in heterotrimer distribution after GPCR activation, but in retrospect we should have directly addressed free Gbg translocation and made the distinction clear. 

      In the revised manuscript we show that during stimulation we observe changes consistent with modest translocation of endogenous Gbg from the plasma membrane and sampling of intracellular compartments. To our knowledge this is the first demonstration of endogenous Gbg translocation.

      We have added:

      “With overexpressed G proteins free Gbg dimers translocate from the plasma membrane and sample intracellular membrane compartments after activation-induced dissociation from Ga subunits. Consistent with this, we observed small decreases in bystander BRET at the plasma membrane and small increases in bystander BRET at intracellular compartments during activation of GPCRs, suggesting that endogenous Gbg subunits undergo similar translocation (Figure 5- figure supplement 1). Notably, these changes occurred at room temperature, suggesting that endocytosis was not involved, and developed over the course of minutes. The latter observation and the small magnitude of agonist-induced changes are both consistent with expression of primarily slowly-translocating endogenous Gg subtypes in HEK 293 cells. Moreover, as shown previously for overexpressed Gbg, the changes we observed with endogenous Gbg were readily reversible (Figure 5- figure supplement 1), suggesting that most heterotrimers reassemble at the plasma membrane after activation ceases.”

      Can the authors note that OpenCell has endogenously tagged Gβ1 and reports more obvious internal localization? Can the authors comment on this point?

      OpenCell has tagged GNB1 and the Leonetti group kindly provided a parent cell line we used to add a slightly different tag. Although their study did not identify any specific intracellular compartments, our impression is that most of the internal structures visible in their images are likely to be lysosomes, as they are large, round and often have a clear lumen. Overall their images and ours are comfortingly similar. We have added:

      “Unsurprisingly, our images are quite similar to those made as part of previous study that labeled Gb1 subunits with mNG2 (Cho et al., 2022).”

      Notably, the Leonetti group has recently reported the subcellular distribution of many untagged proteins using a proteomic approach. They find that Gb1 is enriched on the plasma membrane and lysosomes but is not enriched on endosomes, the Golgi apparatus, endoplasmic reticulum or mitochondria (https://www.biorxiv.org/content/10.1101/2023.12.18.572249v1). We have cited this work in the revised manuscript.

      Is this the first use of CRISPR / HiBit for BRET assay? It would be helpful to know this or cite previous work if not. Also, as this is submitted as a tools piece, the authors might say a little more about the potential application to other questions.

      The only previous study we are aware of utilizing a similar combination of methods is a 2020 report from the group of Dr. Stephen Hill, in which the authors studied binding of fluorescent ligands to HiBit-tagged GPCRs. This work is now cited.

      We have also added the following to our previous brief statement about potential applications:

      “In addition, it may also be possible to use these cells in combination with targeted sensors to study endogenous G protein activation in different subcellular compartments. More broadly, our results show that subcellular localization of endogenous membrane proteins can be studied in living cells by adding a HiBit tag and performing bystander BRET mapping. Applied at large scale this approach would have some advantages over fluorescent protein complementation, most notably the ability to localize endogenous membrane proteins that are expressed at levels that are too low to permit fluorescence microscopy.”

      Reviewer #3 (Public Review):

      Summary:

      This article addresses an important and interesting question concerning intracellular localization and dynamics of endogenous G proteins. The fate and trafficking of G protein-coupled receptors (GPCRs) have been extensively studied but so far little is known about the trafficking routes of their partner G proteins that are known to dissociate from their respective receptors upon activation of the signaling pathway. The authors utilize modern cell biology tools including genome editing and bystander bioluminescence resonance energy transfer (BRET) to probe intracellular localization of G proteins in various membrane compartments in steady state and also upon receptor activation. Data presented in this manuscript shows that while G proteins are mostly present on the plasma membrane, they can be also detected in endosomal compartments, especially in late endosomes and lysosomes. This distribution, according to data presented in this study, seems not to be affected by receptor activation. These findings will have implications in further studies addressing GPCR signaling mechanisms from intracellular compartments.

      Strengths:

      The methods used in this study are adequate for the question asked. Especially, the use of genome-edited cells (for the addition of the tag on one of the G proteins) is a great choice to prevent the effects of overexpression. Moreover, the use of bystander BRET allowed authors to probe the intracellular localization of G proteins in a very high-throughput fashion. By combining imaging and BRET authors convincingly show that G proteins are very low abundant on early endosomes (also ER, mitochondria, and medial Golgi), however seem to accumulate on membranes of late endosomal compartments.

      Weaknesses:

      While the authors provide a novel dataset, many questions regarding G protein trafficking remain open. For example, it is not entirely clear which pathway is utilized to traffic G proteins from the plasma membrane to intracellular compartments. Additionally, future studies should also address the dynamics of G protein trafficking, for example by tracking them over multiple time points.

      We agree, there is much more to do.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      On page 7 the text says "the difference did reach significance (Figure 5D)". It looks like the difference did not reach significance. Please check on this.

      Thank you, this was an unfortunately significant typo.

      Reviewer #3 (Recommendations For The Authors):

      This article addresses an important and interesting question concerning intracellular localization and dynamics of endogenous G proteins. While the posed question is indeed a grand one and the methods used by the authors are novel, I believe that the data presented in this manuscript are still insufficient to support all claims posed by the authors. Below I list my major concerns:

      (1) The authors claim that they provide a "detailed subcellular map of endogenous G protein distribution", however, the map is in my opinion not sufficiently detailed (e.g. trans-Golgi network is not included) and not quantitative enough (e.g. % of proteins present on one compartment vs. the other as authors claim that BRET signals "cannot be directly compared between different compartments"). To strengthen this statement, except for providing more extensive and quantitative data, it would be beneficial to provide such a "map" as an illustration based on the findings presented in this article.

      “Detailed” is certainly a subjective term. While we maintain that our description of endogenous G protein distribution is far more detailed than any previous study, we now simply claim to provide a “subcellular map”. We have added images of TGNP (TGN46; TGOLN2), showing that endogenous G proteins are readily detectable on the structures labeled by this marker. These data are now provided in Figure 3 – figure supplement 7.

      We did not claim that our study was quantitative- we did not try to count G proteins. However, if we use published estimates of total G proteins and surface area for HEK 293 cells we estimate that there are roughly 2,500 G proteins µm-2 on the plasma membrane and 500 G proteins µm-2 on endocytic vesicles. For other intracellular compartments relative density can be approximated by inspecting images, but a truly quantitative estimate would require a surface area standard analogous to FM4-64 for each compartment. The percentage of the total G protein pool on a given compartment is, in our opinion, less important than the density of G proteins on that compartment, as the latter is more likely to affect the efficiency of local signal transduction. Since we do not claim to have accurate G protein density estimates for many intracellular compartments, we prefer to provide several raw images for each compartment rather than a schematized map.

      Bystander BRET values cannot be compared directly across compartments due to differences in expression and energy transfer efficiency of different markers and compartment surface area. This method is well suited for following changes in distribution as a function of time or after perturbations and for sensitive detection of weak colocalization but can only provide approximate “maps” of absolute distribution.

      (2) Probing of the intracellular distribution of these proteins, especially after GPCR activation, includes a single chosen timepoint. I believe that the manuscript would greatly benefit from including some dynamic data on internalization and intracellular trafficking kinetics. What is the turnover of tested G proteins? What is the fraction that is going to recycling compartments and/or lysosomes? Authors could perhaps turn to other methods to be able to dynamically track proteins over time e.g. via photoconversion techniques.

      Because G protein trafficking appears to be largely constitutive there is no easy way for us to assess how long it takes G proteins to transit various intracellular compartments, although we agree this would be interesting. As the reviewer suggests, dynamic data on constitutive trafficking would require methods (such as photoconversion) not currently available to us for endogenous G proteins. Accordingly, we have made no claims regarding the kinetics of G protein trafficking. As for possible redistribution after GPCR activation, in the revised manuscript we have added 5- and 15-minute timepoints after agonist stimulation for our bystander BRET mapping (Figure 5- figure supplement 2). These timepoints were chosen to correspond to persistent signaling mediated by internalized receptors. 

      (3) Exemplary images with cells showing significant colocalization with lysosomal compartments seem to contain more intracellular vesicles visible in the mNG channel than in the case of the other compartment. Is it an effect of the treatment to stain lysosomes? It would be beneficial to compare it with some endogenous marker e.g. LAMP1 without additional treatments.

      The visibility of intracellular vesicles in our lysosome images likely reflects our selection of cells and regions with visible and abundant lysosomes, specifically peripheral regions directly adhered to the coverslip, rather than treatment with lysosomal stains (LV 633 and dextran). As suggested, we now include images of cells expressing LAMP1 as an alternative lysosome marker (Figure 3 - figure supplement 6).

      (4) The authors probe an abundance of G proteins along the constitutive endocytic pathway. However, to prove that G proteins are not de-palmitoylated rather than endocytosed authors should perform control experiments where endocytosis is blocked e.g. pharmacologically or via a knockdown approach. Additionally, various endocytic pathways can be probed.

      We did not claim that depalmitoylation plays no role in delivery of G proteins to internal compartments. In fact, we pointed out that we cannot at present rule out other pathways and delivery mechanisms. Importantly, if some of the G proteins that we detect along the endocytic pathway do arrive there by trafficking through the cytosol this would only strengthen our major conclusion that endocytosis is inefficient.

      Having said this, we have now conducted extensive experiments investigating the role of palmitate cycling in the trafficking of heterotrimeric G proteins and the small G protein H-Ras. Our results suggest that a depalmitoylation-repalmitoylation cycle is not important for the distribution of heterotrimers, but these findings will be the subject of a separate publication focused on this specific question for both large and small G proteins.

      We agree that it will be interesting to probe different endocytic pathways, as suggested using a genetic approach. Our main interest here was in endocytic membranes that were defined functionally (with FM4-64 or internalized receptors) rather than biochemically.

      Minor comments:

      (5) "Imaging" paragraph in the Methods section refers to a non-existent figure called "SI Appendix S9".

      Thank you.

      (6) It is not clear what was used as a "control" in Figure 5E.

      “Control” refers to DPBS vehicle alone. This information is now added to the legend for Figure 5E.

    1. eLife assessment

      This paper presents a valuable automated method to track individual mammalian cells as they progress through the cell cycle using the FUCCI system. The authors have developed a technique for analyzing cells that grow in suspension and used their method to look at different tumor cell lines that grow in suspension and determine the effect of drugs that directly affect the cell cycle. They show solid evidence that the method can be applied to both adherent and non-adherent cell lines. This paper will be of interest to cell biologists investigating cell cycle effects.

    2. Reviewer #3 (Public review):

      Summary:

      This paper provides presents an automated method to track individual mammalian cells as they progress through the cell cycle using the FUCCI system, and applies the method to look at different tumor cell lines that grow in suspension and determine their cell cycle profile and the effect of drugs that directly affect the cell cycles, on progression through the cell cycle for a 72 hour period.

      Strengths:

      This is a METHODS paper. The one potentially novel finding is that they can identify cells which are at the G1-S transition by the change in color as one protein starts to go up and the other one goes down, similar to change seen as cells enter G2/M. They have provided detailed data in the resubmission, demonstrating how this can be done in different cell lines and that the resolution of the brief time is about (about 1 hr) when the cells are determined to be in the transition from G1 to S. They further showed how one can explore this period (using EDU labeling in conjunction with FUCCI how one can determine whether cells have entered S-phase. This nicely addressed a weakness identified in the previous review.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1 and 2: “The pipeline relies on a large number of hard-coded conditions: size of Gaussian blur (Gaussian should be written in uppercase), values of contrast, size of filters, levels of intensity, etc. Presumably, the authors followed a heuristic approach and tried values of these and concluded that the ones proposed were optimal. A proper sensitivity analysis should be performed. That is, select a range of values of the variables and measure the effect on the output.”

      “Linked to the previous comments. Other researchers that want to follow the pipeline would have either to have exactly the same acquisition conditions as the manuscript or start playing with values and try to compensate for any difference in their data (cell diameter, fluorescent intensity, etc.) to see if they can match the results of the manuscript.”

      We thank the Reviewer for his insightful comments. We have modified the “Usage” section of the GitHub page (https://github.com/ieoresearch/cellcycle-image-analysis) to include, for each step of the image processing, a paragraph explaining the significance of the operation and a paragraph named “Suggested Values Range” where tips for optimal parameter settings are given and examples with different parameter settings are shown. We believe that these new paragraphs help researchers easily customize the pipeline to their own data.

      Reviewer 2:

      Comment 1: “It would be useful to include frames from the movie showing a G1/S cell in Figures 1 and S1 with some indication of how long that cell is present. From Figure S4 it looks like it is substantially less than an hour.

      It would definitely be nice to validate this observation. A brief pulse of EdU together with the FUCCI colors could allow you to do that in a culture of cycling cells. It appears that the green color as cells enter S-phase develops slowly (and maybe gets brighter continuously) as does the red color as cells progress through G1. It would be nice to validate what the color the cells are when they actually initiate DNA replication.”  

      We thank the Reviewer for the opportunity to further investigate our results and clarify points that were unclear in the first version of the manuscript. As suggested, we have included all acquired frames depicting the G1 to S transition/early S phase of three cells: the Kasumi-1 untreated cell and the PF-06873600 treated NB4 cell shown in Fig. 1A, and the MDA-MB-231 cell shown in Fig. S1; they are shown in panels D of Fig. 4 and S5, respectively.

      For the Kasumi-1 and NB4 cells, the G1 to S transition/early S phase, defined in the pipeline refinement step as a yellow phase appearing before the S phase, is visible at the 12-hour frame. Conversely, the MDA-MB-231 cell shown in Fig. S5D does not exhibit the G1 to S or early S phase, yellow; it transitions abruptly from red to green within our acquisition timeframe (30 min in this case), producing a green early S phase. This observation supports the Reviewer's suggestion that the G1 to S yellow transition is often shorter than one hour and it is not identifiable in all cells.

      To further investigate this point, we also conducted the EdU experiments kindly suggested by the Reviewer. Kasumi-1 and MDA-MB-231 cells expressing the FUCCI(CA)2 probes were exposed to a pulse of EdU, and subsequently analyzed using flow cytometry and confocal microscopy. A new paragraph titled “The workflow allows the identification of the G1 to S phase transition” has been added to the Results section, with the corresponding data presented in Fig. 4 and Fig. S5 for Kasumi-1 and MDA-MB-231 cells, respectively. The Methods section has also been updated describing the new experiments.

      Additionally, in BOX1 under the 'Cell phase assignment' paragraph, point (III), we have removed point 'a. Re-assign the G2/M frames to G1'. Although theoretically possible according to the pipeline, this reassignment is incorrect in practice because mVenus fluorescence indicates that the cells are starting or have already initiated DNA replication.

      All the modifications we made in the text and Figure captions are highlighted in red. We would be thankful if the co-first authorship of Kourosh Hayatigolkhatmi, Chiara Soriani and Emanuel Soda is acknowledged in the final published version of the article.

      We believe that the revisions have strengthened our manuscript, and we hope that it now meets the reviewers' suggestions for greater clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In the present study, Rincon-Torroella et al. developed ME3BP-7, a microencapsulated formulation of 3BP, as an agent to target MCT1 overexpressing PDACs. They provided evidence showing the specific killing of PDAC cells with MCT1 overexpressing in vitro, along with demonstrating the safety and anti-tumor efficacy of ME3BP-7 in PDAC orthotopic mouse models.

      Strengths:

      * Developed a novel agent.

      * Well-designed experiments and an organized presentation of data that support the conclusions drawn.

      Weaknesses:

      There are some minor issues that could enhance the clarity and completeness of the study:

      (1) Statistical results should be visually presented in Figure 4 and Figure S1.

      (2) Given the tumor heterogeneity and the identification of focal high expression of MCT1 in Figure 7 and Figure S5B, it is suggested that the authors include the results of immunohistochemical (IHC) analysis of MCT1 expression in both control and ME3BP-7 treated tumor tissues. This addition may offer insight into whether the remaining tumors are composed of PDAC cells with negative MCT1 expression, while the cells with relatively high levels of MCT1 expression were eliminated by ME3BP-7 treatment.

      (3) The authors are encouraged to discuss the future directions for improving the efficacy of this study. For example, exploring the combination of ME3BP-7 with a glutaminase-1 inhibitor (PMID 37891897) could be a valuable avenue for further research.

      We thank the reviewer for pointing these out. We have addressed these individually in detail in the next section

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript by Rincon-Torroella et al, the authors evaluated the therapeutic potential of ME3BP-7, a microencapsulated formulation of 3BP which specifically targets MCT-1 high tumor cells, in pancreatic cancer models. The authors showed that, compared to 3BP, ME3BP-7 exhibited much-enhanced stability in serum. In addition, the authors confirmed the specificity of ME3BP-7 toward MCT-1 high tumor cells and demonstrated the in vivo anti-tumor effect of ME3BP-7 in orthotopic xenograft of human PDAC cell line and PDAC PDX model.

      Strengths:

      (1) The study convincingly demonstrated the superior stability of ME3BP-7 in serum.

      (2) The specificity of ME3BP-7 and 3BP toward MCT-1 high PDAC cells was clearly demonstrated with CRISPR-mediated knockout experiments.

      Weaknesses:

      The advantage of ME3BP-7 over 3BP under an in vivo situation was not fully established.

      This is a helpful observation indeed and we have attempted to address this in the revised manuscript as well as clarified the details in the following section in detail.

      Reviewer #1 (Recommendations For The Authors):

      There are some minor issues that could enhance the clarity and completeness of the study:

      We appreciate these comments and have addressed them to the best of our abilities in the revised manuscript.

      (1) Statistical results should be visually presented in Figure 4 and Figure S1.

      Figure 4 and S1 have been updated to include visual representation of statistical results.

      (2) Given the tumor heterogeneity and the identification of focal high expression of MCT1 in Figure 7 and Figure S5B, it is suggested that the authors include the results of immunohistochemical (IHC) analysis of MCT1 expression in both control and ME3BP-7 treated tumor tissues. This addition may offer insight into whether the remaining tumors are composed of PDAC cells with negative MCT1 expression, while the cells with relatively high levels of MCT1 expression were eliminated by ME3BP-7 treatment.

      This is an excellent suggestion, but unfortunately, we were unable to implement it.   We identified a single antibody that showed specificity in our MCT1 knockout isogenic panel after testing 6 different commercial anti-MCT1 antibodies. While the chosen antibody (sc-365501) worked well on fixed human pancreatic cancer samples, it exhibited significant cross-reactivity against background mouse tissue, rendering it difficult to effectively visualize the orthotopically implanted PDx samples.  

      (3) The authors are encouraged to discuss the future directions for improving the efficacy of this study. For example, exploring the combination of ME3BP-7 with a glutaminase-1 inhibitor (PMID 37891897) could be a valuable avenue for further research.

      We have included potentially useful combinations of ME3BP-7 in the discussion section.

      Reviewer #2 (Recommendations For The Authors):

      The overall study is straightforward with translational significance. However, additional clarification is needed to determine the novelty of the study. As cited by the authors, the same group previously published a paper in Clinical Cancer Research, demonstrating the anti-tumor effect of beta-CD-3BP which is also a microencapsulated form of 3BP prepared with succinyl-beta-cyclodextrin. Please clarify what is the major difference between the ME3BP-7 and beta-CD-3BP.

      We designed the first generation of beta-CD-3BP and presented the preliminary results in the Clinical Cancer Research paper.  Over the last several years, we sought to optimize the formulation so that it would be a a robust clinical candidate. The current manuscript describes our in-depth exploration.

      We used a combination of SEC HPLC analyses (representative chromatogram in Fig. 3A) along with a newly developed assay to assess serum stability (representative data in Fig 3B) of a panel of ME-3BP complexes. The panel was created by varying the molar ratios of three different beta-CDs (succinyl beta-CD, native beta-CD and hydroxypropyl beta CD) to 3BP.   We discovered that an excess of succinyl-beta-CD (1.2 :1) resulted in the most stable agent with no noticeable batch effects, and this formulation was dubbed ME3BP-7).

      The study clearly demonstrated the superior stability of ME3BP-7 in serum compared to 3BP. To further support the advantage of ME3BP-7, it will be important to include the same dose of 3BP as a control in the in vivo treatment experiment to evaluate the difference in both toxicity and anti-tumor effect.

      We wanted to include a control arm in our study wherein the same dose of 3BP was used. However, in toxicity studies on three different species of mice, we found that infusion of 3BP at the identical dose was highly toxic, killing the animals within a few days.  We have highlighted this toxicity of the non-microencapsulated 3BP in the revised manuscript.

    2. eLife assessment

      This study presents a valuable finding and developed ME3BP-7 as a novel microencapsulated formulation of 3BP, which specifically targets MCT1-overexpressing PDAC cells. It demonstrates its specificity and efficacy in vitro and in PDAC mouse models, with significant anti-tumor effects and improved serum stability. Overall, the evidence supporting the authors' claims is solid.

    3. Reviewer #1 (Public review):

      Summary:

      In this revised manuscript, Rincon-Torroella et al. developed ME3BP-7, a microencapsulated formulation of 3BP, as a potential agent to target MCT1 overexpressing PDACs. The authors provided compelling experimental evidence demonstrating the specific and rapid killing of MCT1 overexpressing PDAC cells in vitro, along with the safety and significant anti-tumor efficacy of ME3BP-7 in multiple PDAC orthotopic mouse models. Overall, this study is very novel, with well-designed experiments and a clear, organized presentation of data that supports the conclusions. The authors have effectively addressed the questions raised in the primary review and provided a thorough discussion of the study's significance, limitations, and future directions, which enhances the readers' understanding of the potential clinical impact of this research.

      Strengths:

      * Developed a novel agent.<br /> * Well-designed experiments and an organized presentation of data that support the conclusions.

      Weaknesses:

      No significant weaknesses are noticed.

    4. Reviewer #2 (Public review):

      Summary:

      In the manuscript by Rincon-Torroella et al, the authors evaluated the therapeutic potential of ME3BP-7, a microencapsulated formulation of 3BP which specifically target MCT-1 high tumor cells, in pancreatic cancer models. The authors showed that, compared to 3BP, ME3BP-7 exhibited much enhanced stability in serum. In addition, the authors confirmed the specificity of ME3BP-7 toward MCT-1 high tumor cells and demonstrated the in vivo anti-tumor effect of ME3BP-7 in orthotopic xenograft of human PDAC cell line and PDAC PDX model.

      Strengths:

      (1) The study convincingly demonstrated the superior stability of ME3BP-7 in serum.<br /> (2) the specificity of ME3BP-7 and 3BP toward MCT-1 high PDAC cells was clearly demonstrated with CRISPR-mediated knockout experiments.<br /> (3) The advantage of ME3BP-7 over 3BP under in vivo situation is highlighted in the revised manuscript.

    1. eLife assessment

      This important study identifies a new class of small molecules that activate the integrated stress response via the kinase HRI. Solid evidence indicates that two of these compounds promote mitochondrial elongation. The findings would be strengthened if the mutant cells with reduced fusion activity of Mfn2 were analyzed for the rescue of mitochondrial functions.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript (Baron, Oviedo et al., 2024) builds on a previous study from the Wiseman lab (Perea, Baron et al., 2023) and describes the identification of novel nucleoside mimetics that activate the HRI branch of the ISR and drive mitochondrial elongation. The authors develop an image processing and analysis pipeline to quantify the effects of these compounds on mitochondrial networks and show that these HRI activators mitigate ionomycin-driven mitochondrial fragmentation. They then show that these compounds rescue mitochondrial morphology defects in patient-derived MFN2 mutant cell lines.

      Strengths:

      The identification of new ISR modulators opens new avenues for biological discovery surrounding the interplay between mitochondrial form/function and the ISR, a topic that is of broad interest. It also reinforces the possibility that such compounds might represent new potential therapeutics for certain mitochondrial disorders. The development of a quantitative image analysis pipeline is valuable and has the potential to extract the subtle effects of various treatments on mitochondrial morphology.

      Weaknesses:

      I have three main concerns.

      First, support for the selectivity of compounds 0357 and 3610 acting downstream of HRI comes from using knockdown ISR kinase cell lines and measuring the fluorescence of ATF4-mApple (Figure 1G and 1H). However, the selectivity of these compounds acting through HRI is not shown for mitochondrial morphology. Is mitochondrial elongation blocked in HRI knockdown cells treated with the compounds? While the ISRIB treatment does block mitochondrial elongation, ISRIB acts downstream of all ISR kinases and doesn't necessarily define selectivity for the HRI branch of the ISR. Additionally, are the effects of these compounds on ATF4 production and mitochondrial elongation blocked in a non-phosphorylatable eIF2alpha mutant? This point of selectivity/specificity of the compounds gets at a semantic stumbling block I encountered in the text where it was often stated "stress-independent activation" of ISR kinases. Nucleoside mimetics are likely a very biologically active class of molecules and are likely driving some level of cell stress independent of a classical ISR, UPR, heat-shock response, or oxidative stress response.

      Second, it is difficult for me to interpret the data for the quantification of mitochondrial morphology. In the legend for Figure 2, it is stated that "The number of individual measurements for each condition are shown above." Are the individual measurements the number of total cells quantified? If not, how many total cells were analyzed? If the individual measurements are distinct mitochondrial structures that could be quantified why are the n's for each parameter (bounding box, ellipsoid principal axis, and sphericity) so different? Does this mean that for some mitochondria certain parameters were not included in the analysis? For me, it seems more intuitive that each mitochondrial unit should have all three parameters associated with it, but if this isn't the case it needs to be more carefully described why.

      Third, the impact of these compounds on the physiological function of mitochondria in the MFN2.D414V mutants needs to be measured. Sharma et al., 2021 showed a clear deficit in mitochondrial OCR in MFN2.D414V cells which, if rescued by these compounds, would strengthen the argument that pharmacological ISR kinase activation is a strategy for targeting the functional consequences of the dysregulation of mitochondrial form.

    3. Reviewer #2 (Public review):

      Summary.

      Mitochondrial dysfunction is associated with a wide spectrum of genetic and age-related diseases. Healthy mitochondria form a dynamic reticular network and constantly fuse, divide, and move. In contrast, dysfunctional mitochondria have altered dynamic properties resulting in fragmentation of the network and more static mitochondria. It has recently been reported that different types of mitochondrial stress or dysfunction activate kinases that control the integrated stress response, including HRI, PERK, and GCN2. Kinase activity results in decreased global translation and increased transcription of stress response genes via ATF4, including genes that encode mitochondrial protein chaperones and proteases (HSP70 and LON). In addition, the ISR kinases regulate other mitochondrial functions including mitochondrial morphology, phospholipid composition, inner membrane organization, and respiratory chain activity. Increased mitochondrial connectivity may be a protective mechanism that could be initiated by pharmacological activation of ISR kinases, as was recently demonstrated for GCN2.

      A small molecule screening platform was used to identify nucleoside mimetic compounds that activate HRI. These compounds promote mitochondrial elongation and protect against acute mitochondrial fragmentation induced by a calcium ionophore. Mitochondrial connectivity is also increased in patient cells with a dominant mutation in MFN2 by treatment with the compounds.

      Strengths:

      (1) The screen leverages a well-characterized reporter of the ISR: translation of ATF4-FLuc is activated in response to ER stress or mitochondrial stress. Nucleoside mimetic compounds were screened for activation of the reporter, which resulted in the identification of nine hits. The two most efficacious dose-response tests were chosen for further analysis (0357 and 3610). The authors clearly state that the compounds have low potency. These compounds were specific to the ISR and did not activate the unfolded protein response or the heat shock response. Kinases activated in the ISR were systematically depleted by CRISPRi revealing that the compounds activate HRI.

      (2) The status of the mitochondrial network was assessed with an Imaris analysis pipeline and attributes such as length, sphericity, and ellipsoid principal axis length were quantified. The characteristics of the mitochondrial network in cells treated with the compounds were consistent with increased connectivity. Rigorous controls were included. These changes were attenuated with pharmacological inhibition of the ISR.

      (3) Treatment of cells with the calcium ionophore results in rapid mitochondrial fragmentation. This was diminished by pre-treatment with 0357 or 3610 and control treatment with thapsigargin and halofuginone

      (4) Pathogenic mutations in MFN2 result in the neurodegenerative disease Charcot-Marie-Tooth Syndrome Type 2A (CMT2A). Patient cells that express Mfn2-D414V possess fragmented mitochondrial networks and treatment with 0357 or 3610 increased mitochondrial connectivity in these cells.

      Weaknesses:

      The weakness is the limited analysis of cellular changes following treatment with the compounds.

      (1) Unclear how 0357 or 3610 alter other aspects of cellular physiology. While this would be satisfying to know, it may be that the authors determined that broad, unbiased experiments such as RNAseq or proteomic analysis are not justified due to the limited translational potential of these specific compounds.

      (2) There are many changes in Mfn2-D414V patient cells including reduced respiratory capacity, reduced mtDNA copy number, and fewer mitochondrial-ER contact sites. These experiments are relatively narrow in scope and quantifying more than mitochondrial structure would reveal if the compounds improve mitochondrial function, as is predicted by their model.

    4. Reviewer #3 (Public review):

      Summary:

      Mitochondrial injury activates eiF2α kinases - PERK, GCN2, HRI, and PKR - which collectively regulate the Integrated Stress Response (ISR) to preserve mitochondrial function and integrity. Previous work has demonstrated that stress-induced and pharmacologic stress-independent ISR activation promotes adaptive mitochondrial elongation via the PERK and GCN2 kinases, respectively. Here, the authors demonstrate that pharmacologic ISR inducers of HRI and GCN2 enhance mitochondrial elongation and suppress mitochondrial fragmentation in two disease models, illustrating the therapeutic potential of pharmacologic ISR activators. Specifically, the authors first used an innovative ISR translational reporter to screen for nucleoside mimetic compounds that induce ISR signaling and identified two compounds, 0357 and 3610, that preferentially activate HRI. Using a mitochondrial-targeted GFP MEF cell line, the authors next determined that these compounds (as well as the GCN2 activator, halofuginone) enhance mitochondrial elongation in an ISR-dependent manner. Moreover, pretreatment of MEFs with these ISR kinase activators suppressed pathological mitochondrial fragmentation caused by a calcium ionophore. Finally, pharmacologic HRI and GCN2 activation were found to preserve mitochondrial morphology in human fibroblasts expressing a pathologic variant in MFN2, a defect that leads to mitochondrial fragmentation and is a cause of Charcot Marie Tooth Type 2A disease.

      Strengths:

      This well-written manuscript has several notable strengths, including the demonstration of the potential therapeutic benefit of ISR modulation. New chemical entities with which to further interrogate this stress response pathway are also reported. In addition, the authors used an elegant screen to isolate compounds that selectively activate the ISR and identify which of the four kinases was responsible for activation. Special attention was also paid to a thorough evaluation of the effect of their compounds on other stress response pathways (i.e. the UPR, and heat and oxidative stress responses), thereby minimizing the potential for off-target effects. The implementation of automated image analysis rather than manual scoring to quantify mitochondrial elongation is not only practical but also adds to the scientific rigor, as does the complementary use of both the calcium ionophore and MFN2 models to enhance confidence and the broad therapeutic potential for pharmacology ISR manipulation.

      Weaknesses:

      The only minor concerns are with regard to effects on cell health and the timing of pharmacological administration.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reveiwer#1 (Public Review):

      Weaknesses:

      While the novel compound showed a promising potency to the HER2-positive gastric cancer cells and xenograft model, it would be great to also to be evaluated with the HER2-positive breast cancer cell models. The author did not compare the current compounds with other therapeutic strategies targeting HER2 expression at the genetic level. It is unclear whether the EGFR inhibitors gefitinib and canertinib but not HER2-specific inhibitors (i.e. tucatinib) were used as a control in the manuscript.

      We appreciate the reviewer’s insightful comments. Evaluating compound 10 on HER2-positive breast cancer cells is indeed crucial, especially given the established HER2-targeting therapies for breast cancer. In response to this concern, we conducted additional experiments to investigate the impact of compound 10 on HER2-positive breast cancer cell lines AU565 and BT474, specifically assessing its HER2 downregulating activity (Author response image 1).

      Author response image 1.

      HER2 downregulatory effect of compound 10 in HER2-positive breast cancer cell lines, AU565 and BT474.

      The selection of gefitinib (an EGFR tyrosine kinase inhibitor) and canertinib (a pan-HER inhibitor) as positive controls in our manuscript is based on their demonstrated ability to inhibit the protein-protein interaction (PPI) between ELF3 and MED23, as previously reported (J Adv Res. 47, (2023) 173-87. 10.1016/j.jare.2022.08.003; Cancer letters. 325, (2012) 72-9. 10.1016/j.canlet.2012.06.004). In referenced studies, SEAP reporter gene assay was utilized to screen compounds for their capacity to disrupt the ELF3-MED23 PPI. This assay involves GAL4-ELF3 binding to a GAL4 binding site in the SEAP reporter gene, followed by interaction with MED23, leading to RNA polymerase II recruitment and SEAP expression in cells (J Am Chem Soc. 2004, 126(49), 15940. doi: 10.1021/ja0445140). Canertinib exhibited stronger inhibitory activity against ELF3-MED23 PPI compared to gefitinib, but also showed non-specific cytotoxicity. YK1 was subsequently developed based on structural analysis of the interfaces between gefitinib and MED23, and between ELF3 and MED23. Considering the previously validated inhibitory activities of gefitinib and canertinib, these drugs were selected as positive controls in the current study to compare the ELF3-MED23 inhibitory efficacy of novel compounds.

      Reveiwer#1 (Recommendations For the Authors):

      (1) It is unclear how compound 5 did not inhibit HER2 overexpression at mRNA but at protein levels as compounds 3 and 10. Could the author further explain the potential mechanism for compound 5?

      While the exact mechanism remains unclear, the results indicated that compound 5 likely affects the protein level of HER2 through somewhat non-specific mechanisms rather than by inhibiting the ELF3-MED23 PPI. Based on this assessment, compound 5 was excluded from further investigation.

      (2) The HER2 expression and its downstream signaling pathway assay are unclear about the approach. It needs to be included in the methods or supplementary.

      We investigated the ELF3-MED23 PPI inhibitory activity and its subsequent effect on HER2 downregulation using a comprehensive approach involving multiple techniques to ensure precise and unbiased experimental results.

      To assess PPI inhibition, we employed the following assays:

      · SEAP reporter gene assay

      · Fluorescence polarization (FP)

      · Split-luciferase complementation assay

      · GST-pulldown

      · Immunoprecipiation (IP)

      HER2 expression levels were evaluated through:

      · SEAP reporter gene assay

      · Luciferase promoter assay

      · Quantification of HER2 mRNA using qPCR

      · Measurement of HER2 protein levels via western blot analysis

      To evaluate downstream signaling of HER2, we analyzed:

      · Phosphorylation levels of MAPK (pMAPK) and AKT (pAKT)

      These methods were systematically applied to elucidate the mechanism of action of compound 10 in inhibiting ELF3-MED23 interaction and subsequently downregulating HER2.

      For clarity, we have revised the manuscript to provide a detailed description of the experimental methods to assess PPI, as described below.

      “SEAP assay was performed as previously described to measure ELF3-MED23 PPI-dependent HER2 transcription [29]. In this assay, the GAL4-ELF3 fusion protein binds to one of the five GAL4 binding sites on the reporter gene (pG4IL2SX). The interaction between the GAL4-ELF3 fusion protein and endogenous MED23 induces the expression of the SEAP. Once expressed, SEAP acts as a phosphatase on the substrate 4-MUP (4-methyl umbelliferyl phosphate), resulting in increased fluorescence. The mammalian expression vector, …”

      “FP assay was conducted following a previously described method to evaluate the molecular interaction between ELF3 and MED23 [29]. The FP assay operates on the principle of the molecular rotation dynamics. When a fluorescently labeled small molecule is excited by polarized light, the emitted fluorescence can be polarized or depolarized depending on the molecular status. Free small molecules rotate rapidly, altering the orientation of their fluorescence dipole and emitting depolarized light. However, when these small molecules bind to large molecules, such as proteins, the resulting complex rotates more slowly, and the emitted light retains much of its original polarization. In this study, different concentrations of (His)6-MED23391–582, as the large molecule, and 10 nM of FITC-labeled ELF3129–145 peptide, as the fluorescence-labeled small molecule, were combined in …”

      (3) It is confusing to me about the order of the experiments, in which the SAR work came after the synthesis and a series of biochemical studies for the characterization of the synthetic compounds. What is the specific reason for this order?

      We concluded that the current approach is appropriate because the analysis was not intended for structural modification and optimization through SAR (Structure-Activity Relationship) analysis. Instead, the primary objective was to elucidate the structural basis underlying the efficacy of PPI inhibition among compounds sharing the same scaffold. We believe this will provide valuable insights for future design and synthesis of new compounds.

      (4) The yield for each step of the general synthesis needs to be included in the scheme 1.

      Scheme 1 has been updated to include the yield of each step of the synthesis process.

      (5) In line 532, the authors stated 28 compounds, should it be 26?

      ‘Twenty-eight compounds’ includes 26 newly synthesized compounds and 2 positive controls, gefitinib and canertinib.

      (6) Introduction part, lines 74 to 75, "While HER2 gene amplification is the primary mechanism responsible for HER2 overexpression" may not be confirmed in lung cancers.

      HER2 overexpression is usually a direct consequence of gene amplification, although overexpression can occur by other mechanisms [Nat Rev Cancer. 2009;9:463–475. doi: 10.1038/nrc2656.; Cell. 2007;129:1275–1286. doi: 10.1016/j.cell.2007.04.034.]. The levels of HER2 protein expression and gene amplification are linearly associated and highly concordant in breast cancer, colorectal cancer, ovarian cancer, and esophageal adenocarcinoma [World J Gastrointest Oncol. 2019, 11(4): 335–347. doi: 10.4251/wjgo.v11.i4.335; J Clin Oncol. 2002;20:719–26. doi.org/10.1200/JCO.2002.20.3.71; Oncology. 2001;61(Suppl 2):14–21. doi.org/10.1159/000055397; Science. 1989, 244(4905):707-12. doi: 10.1126/science.2470152; Cancer. 2014 Feb 1; 120(3): 415–424. doi: 10.1002/cncr.28435]. As reviewer mentioned, the linear association between of HER2 protein expression and gene amplification has not been fully established for NSCLC [ESMO Open. 2022, 100395. doi: 10.1016/j.esmoop.2022.100395].

      Therefore, we change the sentence as describe below.

      “While HER2 gene amplification is the primary mechanism responsible for HER2 overexpression in most HER2-positive cancers, except in lung cancer [16], high transcription rates of HER2 per gene copy have also been observed to contribute.”

      (7) The abstract part, lines 31 and 32, the detailed experimental data for SEAP needs to be expressed in another way.

      SEAP is a type of reporter gene assay. We revised the manuscript as follows and we additionally described it method part.

      “Upon systematic analysis, candidate compound 10 was selected due to its potency in downregulating reporter gene activity of HER2 promoter confirmed by SEAP activity and its effect on HER2 protein and mRNA levels.”

      (8) The author should combine the box for Chalcone, pyrazoline, Licochalcone E, and YK-1, Figures 1 and 2 into a new single Figure.

      We revised the manuscript following the reviewer's comments.

      (9) Provide the list of antibodies and sources for the cell-based and western blot assays.

      Table S1 presents detailed information about the antibodies and dilution ratios used in the cell-based and western blot assays.

      Reveiwer#2 (Public Reviews):

      Weaknesses:

      The rationale behind the proposed structural modifications for the three groups of compounds is not clear.

      Reveiwer#2 (Recommendations For the Authors):

      (1) Based on previous work experience, it would be interesting to evaluate the in silico mode of interaction of compound 10.

      As suggested by the reviewers, we additionally performed in silico docking study to identify the mode of interaction of compound 10 (Author response image 2). As shown below, the results indicate that compound 10 shares a similar binding orientation with YK1, forming an H-bond with the H449 residue. Although it does not interact with the D400 residue, it was predicted to create an additional H-bond with S450, which is right next to H449, thereby reinforcing the overall binding of compound 10 to MED23. Moreover, compound 10 was additionally predicted to form a pi-pi interaction with F399, which has been previously identified as an important interaction for compounds to demonstrate outstanding PPI inhibitory effect against ELF3 and MED23.

      Author response image 2.

      Docking analysis of compound 10.

      (2) The chalcones presented in this study are structurally similar to those previously presented by the group (ref 29). In said work, most of the compounds exhibited activities with IC50 values between 1.3 and 3 μM, with inhibition values at 10 μM ranging between 80 and 90% in the SEAP assay. These results are similar to those observed in this paper for the same assay. Can an explanation be found?

      Chalcones are inherently flexible molecules, giving them a high chance of occupying critical hotspot residues within the binding interface of ELF3-MED23, irrespective of the side chains introduced to this moiety. However, depending on the type of side chains introduced, the overall drug-like properties of compounds can be significantly altered, while still maintaining their PPI inhibitory effect. The significance of this study lies in our effort to enhance metabolic stability through extensive introduction of methoxy groups and other hydrophobic side chains to the chalcone skeleton, while preserving high PPI inhibitory activity.

      (3) Is the replacement of H and OH by OMe necessary? Does it improve any property (activity, selectivity, bioavailability, solubility, etc.)? Regarding the derivatives of group 2, why did they decide to replace the O-H, which in silico demonstrated favorable hydrogen bond interactions with Asp400? How do these molecules look in the binding site? Perhaps this is a point to discuss since the substitution of OH led to the obtaining of inactive molecules, or is the effect due to substitution with the terminal aromatic ring with 3 OMe?

      We modified the hydroxyl group moiety of YK-1 into a methoxy group to reduce the polarity of the compound, thereby enhancing its cell membrane permeability (Author response image 3) and reducing the likelihood of rapid elimination through phase II metabolic pathways in vivo. Additionally, we considered the potential conversion of the methoxy group back to a hydroxyl group via phase I metabolism in vivo.

      Author response image 3.

      Impact of methoxy group introduction on TPSA (total polar surface area) of each molecule. TPSA of each molecule containing chalcone structure were calculated using the Molinspiration webserver.

      (4) Lines 134 and 134: "Only compounds are in red."

      We revised the manuscript following the reviewer's comments.

      (5) Line 171: "Chalcone skeleton, shown in red."

      We revised the manuscript following the reviewer's comments.

      (6) Line 350: "N-1-acetyl-4,5-dihydropyrazoline."

      We revised the manuscript following the reviewer's comments.

      (7) Scheme 1. Replace "h" with "hr".

      We revised the manuscript following the reviewer's comments. Scheme 1 has been replaced by a new version.

      (8) Where is "Table S1" in SI?

      Tables S1 and S2 are supposed to be included in SI. We will ensure that Tables S1 and S2 are properly uploaded to the SI section.

      (9) In Figure 6, Graph D, to enhance comprehension, please incorporate red arrows indicating drug administration.

      We revised Figure 6 (D) following the reviewer's comments. Red arrows indicating drug administration have been incorporated, along with a descriptive comment "Drug administration" next to each arrow. Additionally, the figure legend now includes a clear description of these additions.

      Reveiwer#3 (Public review):

      Weaknesses:

      Compound 10 potency as PPI inhibitor has been shown in only one cell line NCI-N87.

      Reveiwer#3 (Recommendations For the Authors):

      (1) The authors should show this compound 10 is effective in other gastric cancer cells like KATOIII, SNU1.

      We evaluated the HER2 downregulating activity of compound 10 in the gastric cancer cell line, SNU216, which is confirmed to express high level of HER2 protein (Author response image 4).

      Author response image 4.

      HER2 downregulatory effect of compound 10 in HER2-positive gastric cancer cell line, SNU216. (A) Expression levels of HER2 and ELF3 in various gastric cancer cell lines. (B) HER2 downregulation in the SNU216 cell line following treatment with compound 10.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Kim et al. describes a role for axonal transport of Wnd (a dual leucine zipper kinase) for its normal degradation by the Hiw ubiquitin ligase pathway. In Hiw mutants, the Wnd protein accumulates dramatically in nerve terminals compared to the cell body of neurons. In the absence of axonal transport, Wnd levels rise and lead to excessive JNK signaling that makes neurons unhappy.

      Strengths:

      Using GFP-tagged Wnd transgenes and structure-function approaches, the authors show that palmitoylation of the protein at C130 plays a role in this process by promoting golgi trafficking and axonal localization of the protein. In the absence of this transport, Wnd is not degraded by Hiw. The authors also identify a role for Rab11 in the transport of Wnd, and provide some evidence that Rab11 loss-of-function neuronal degenerative phenotypes are due to excessive Wnd signaling. Overall, the paper provides convincing evidence for a preferential site of action for Wnd degradation by the Hiw pathway within axonal and/or synaptic compartments of the neuron. In the absence of Wnd transport and degradation, the JNK pathway becomes hyperactivated. As such, the manuscript provides important new insights into compartmental roles for Hiw-mediated Wnd degradation and JNK signaling control.

      Weaknesses:

      It is unclear if the requirement for Wnd degradation at axonal terminals is due to restricted localization of HIW there, but it seems other data in the field argues against that model. The mechanistic link between Hiw degradation and compartmentalization is unknown. 

      We thank the Reviewer for valuable comments. In our revised manuscript, we have addressed reviewer ‘s comments and clarified confusions. We did not intent to imply that Rab11 directly mediates anterograde Wnd protein transport towards axon terminals. We re-worded related text throughout our manuscript to avoid confusion. Additionally, to strengthen the link between Rab11 and Wnd, we have added additional data that heterozygous mutation of wnd could rescue the eye degeneration phenotypes caused by Rab11 loss-of-function (new Figure 7C).

      It is unclear if the requirement for Wnd degradation at axonal terminals is due to restricted localization of HIW there, but it seems other data in the field argues against that model. The mechanistic link between Hiw degradation and compartmentalization is unknown.

      We believe that the mechanistic understanding on how Wnd protein turnover is restricted to axon/axon terminals is beyond the scope of current manuscript. We are actively investigating this interesting research question – please see our point-by-point response for details.

      Reviewer #2 (Public Review):

      Summary:

      Utilizing transgene expression of Wnd in sensory neurons in Drosophila, the authors found that Wnd is enriched in axonal terminals. This enrichment could be blocked by preventing palmitoylation or inhibiting Rab1 or Rab11 activity. Indeed, subsequent experiments showed that inhibiting Wnd can prevent toxicity by Rab11 loss of function.

      Strengths:

      This paper evaluates in detail Wnd location in sensory neurons, and identifies a novel genetic interaction between Rab11 and Wnd that affects Wnd cellular distribution.

      Weaknesses:

      The authors report low endogenous expression of wnd, and expressing mutant hiw or overexpressing wnd is necessary to see axonal terminal enrichment. It is unclear if this overexpression model (which is known to promote synaptic overgrowth) would be relevant to normal physiology.

      We agree that most of our subcellular localization studies were conducted using transgenes, which may not accurately reflect endogenous protein localization. Albeit with this technical limitation, our work addresses an important mechanistic link between DLK’s axonal localization and protein turnover, in neuronal stress signaling and neurodegeneration. 

      Additionally, most of our experiments were done using a kinase-dead form of Wnd or with DLKi treatment (DLK kinase inhibitor). Neurons do not display synaptic overgrowth phenotypes under these experimental conditions. Thus, the changes in Wnd axonal localization are likely independent of synaptic overgrowth phenotypes.

      Palmitoylation of the Wnd orthologue DLK in sensory neurons has previously been identified as important for DLK trafficking in a cell culture model.

      Palmitoylation of DLK has been studied in previous works including Holland et al. 2015. These are important works. However, there are significant differences from our findings. First, inhibiting DLK palmitoylation caused cytoplasmic localization of DLK. It has been reported that expression levels of wild-type and the palmitoylation-defective DLK (DLK-CS) in axons are not different in cultured sensory neurons (Holland 2015, Figure 2A and 2B). This could be simply because DLK-CS is entirely cytoplasmic and can readily diffuse into axons – which led to the conclusion that DLK palmitoylation is essential for DLK localization on motile axonal puncta. Second, because of this cytoplasmic localization, DLK-CS failed to induce downstream signaling (Holland 2015).

      However, the behavior of Wnd-CS from our study is entirely different. Wnd-CS does not show diffuse cytoplasmic localization, rather shows discrete localizations in neuronal cell bodies (Figure 2E, Figure 2-supplement 1). Furthermore, Wnd-CS is able to induce downstream signaling (Figure 4 – supplement 1 and 2). Thus, our manuscript is not an extension of previously published work. Rather, our manuscript took advantage of this unique behavior of Wnd-CS and elucidated biological function of the axonal localization of Wnd.

      The authors find genetic interaction between Wnd and Rab11, but these studies are incomplete and they do not support the authors' mechanistic interpretation.

      Our model describes that Wnd is constantly transported to axon terminals for protein degradation (protein turnover), and that this process is essential to keep Wnd activity at low levels to prevent unwanted neuronal stress signal. Based on this model, a failure in Wnd transport to axon terminals – as seen in Wnd-C130S or by Rab11 loss-of-function – would compromises protein degradation of Wnd, hence, results in excessive abundance of Wnd proteins. This was clearly demonstrated for Wnd-C130S (Figure 3) and for Rab11 mutants (Figure 6E), which support our model.

      To strengthen the link between Rab11 and Wnd, we have added additional data in our revised manuscript, which showed that heterozygous mutation of wnd significantly rescued the eye degeneration phenotypes caused by Rab11 loss-of-function (new Figure 7C).

      We did not intent to imply that Rab11 directly mediates anterograde Wnd protein transport towards axon terminals. We re-worded related text throughout our manuscript to avoid confusion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It would be interesting to overexpress Hiw in C4da neurons to see if this can degrade the C130S Wnd protein and reduce ERK signaling, or overexpress Hiw in the Rab11 mutant background to see if this can reduce the accumulation of Wnd or total Wnd levels. This could address the question of whether the reduction in Wnd turnover is due to Hiw's inaccessibility to Wnd.

      Thank you for your comment. We believe this question warrants an independent line of study. Although this is beyond the scope of current work, we would like to share our findings here. We have found that overexpressing Hiw did not suppress the transgenic expression of Wnd-KD in C4da neurons regardless of cellular locations. However interestingly, the same Hiw overexpression suppressed increased Wnd-KD expression by hiw mutations in C4da neuron axon terminals. Thus, it seems that endogenous levels of Hiw in wild-type was sufficient to suppress transgenic expression of Wnd-KD, and that excessive Hiw expression does not further enhance this effect. Currently, we do not know the mechanisms underlying these observations. One possibility is that Hiw functions exclusively in the context of E3 ubiquitin ligase complex. Wu et al. (2007) found that DFsn is synaptically enriched and acts as an F-box protein of Hiw E3 ligase complex. It is possible that DFsn or some other components of Hiw E3 ligase complex determine the subcellular specificity of Hiw function. We are actively pursuing this research question currently.

      (2) The authors claim that Rab11 transports Wnd to the axon terminals. However, they do not see reliable colocalization of Rab11 and Wnd at axon terminals. Can the authors see Rab11-enriched vesicles with Wnd in nerve bundles, or is the role only to sort Wnd onto a post-recycling endosome compartment that moves to axonal terminals without Rab11?

      We apologize for the confusion. We did not intend to claim that Rab11 directly transports Wnd along axons. We suggested that Rab11 is necessary for axonal localization of Wnd by acting at the somatic recycling endosomes since Rab11 and Wnd extensively colocalize in the cell body but not in the axon terminals (Figure 6 and Figure 6 supplement 1). In our new “Figure 6 supplement 1”, we have now added Rab11 and Wnd colocalization in axons (segmental nerves). We also revised the text (line 294-298) “On the other hand, we did not detect any meaningful colocalization between YFP::Rab11 and Wnd-KD::mRFP in C4da axon terminals or in axons (Manders’ coefficient 0.34 ± 0.14 and 0.41 ± 0.10 respectively) (Figure 6 – supplement 1). These suggest that Rab11 is involved in Wnd protein sorting at the somatic REs rather than transporting Wnd directly.” And in Discussion (line 396-398) “These further suggest that Rab11 is not directly involved in the anterograde long-distance transport of Wnd proteins, rather is responsible for sorting Wnd into the axonal anterograde transporting vesicles.”.

      (3) The authors mis-cite the Tortosa et al 2022 study which shows the exact opposite of what the authors state. Tortosa et al show DLK recruitment to vesicles through phosphorylation and palmitoylation is essential for its signaling, not the opposite, so the authors should reword that or remove the citation.

      We believe the citation is correct. Tortosa et al (2022) “Stress‐induced vesicular assemblies of dual leucine zipper kinase are signaling hubs involved in kinase activation and neurodegeneration” describes that membrane association of DLK rather than palmitoylation itself is sufficient for DLK signaling activation. This is achieved by DLK palmitoylation for mammalian DLK. However, when artificially targeted to cellular membranes, palmitoylation defective DLK (mammalian DLK-CS in their study) was able to induce DLK signaling. Specifically, in their Figure 2 (K-N), when targeted to the intracellular membranes of ER and mitochondria, DLK-CS (palmitoylation defective DLK) elicited DLK signaling as shown by c-Jun phosphorylation.

      Reviewer #2 (Recommendations For The Authors):

      Major Concerns:

      (1) A concern is the overinterpretation of results. The authors find the accumulation of Wnd in axon terminals when they express hiw null or when they overexpress Wnd, but extrapolate that this occurs in "normal conditions" without evidence. Could the increase of Wnd in the axonal terminal be in the setting of known synaptic overgrowth associated with transgene expression?

      Most of our work was conducted using a kinase-dead version of Wnd (Wnd-KD) in a wild-type background (Figure 1C and Figure 1 supplement 1). Moreover, Wnd kinase activity does not affect Wnd axonal localization in our experimental settings (Figure 1 supplement 1).

      When using hiw mutant background, the larvae were treated with Wnd kinase inhibitor thus, prevented excessive axonal growth (Figure 1E, bottom right image – note that there is no axonal overgrowth in this condition). Additionally, Wnd-C130S is expressed lower levels in axon terminals than Wnd (Figure 3B) while exhibiting similar axon overgrowth (Figure 4 supplement 1B). Taken together, axonal overgrowth is unlikely affect axonal protein localization of Wnd.

      (2) The interpretation of these results is based on a supposition that Rab11 anterogradely transports Wnd along axons without evidence for this. Indeed, it has been shown that Rab11 is excluded from axons in mature neurons, but can be mislocalized when overexpressed. This should be addressed in their discussion.

      We apologize for the confusion. We did not intend to suggest that Rab11 directly transports Wnd along axons. We suggested that Rab11 is necessary for axonal localization of Wnd by acting at the somatic recycling endosomes since Rab11 and Wnd extensively colocalize in the cell body but not in the axon terminals (Figure 6 and Figure 6 supplement 1). In our new “Figure 6 supplement 1”, we have now added Rab11 and Wnd colocalization in axons (segmental nerves). We also revised the text (line 296-298) “On the other hand, we did not detect any meaningful colocalization between YFP::Rab11 and Wnd-KD::mRFP in C4da axon terminals or in axons (Manders’ coefficient 0.34 ± 0.14 and 0.41 ± 0.10 respectively) (Figure 6 – supplement 1). These suggest that Rab11 is involved in Wnd protein sorting at the somatic REs rather than transporting Wnd directly.” And in Discussion (line 396-398) “These further suggest that Rab11 is not directly involved in the anterograde long-distance transport of Wnd proteins, rather is responsible for sorting Wnd into the axonal anterograde transporting vesicles.”.

      (3) In Figure 1, the authors should also show images of Wnd-GFSTF in wild-type (non-hiw mutations) to show endogenous Wnd levels in the axon terminal.

      We have now added the figures of Wnd-GFSTF in wild-type (new Figure 1A). To show the comparable fluorescent intensities, we also re-performed hiw mutant experiment and replaced the old images.

      (4) For Figure 1- Supplement, the authors state that the kinase-dead version of Wnd exhibited similar axonal enrichment in comparison to Wnd::GFP in the presence and absence of DLKi. This statement would be better supported with images specifically showing this (for example Wnd-KD::GFP compared to Wnd:GFP with DLKi and Wnd:GFP without DLKi).

      We did not show the images from Wnd::GFP (DLKi) in this supplement figure because it would be redundant with Figure 1C. Rather, we presented the axonal enrichment index for Wnd::GFP (DLKi), Wnd-KD::GFP, Wnd-KD::GFP (DLKi), and Wnd-KD::GFP (DMSO) in Figure 1 supplement 1B.

      Overexpressing catalytically active Wnd dramatically lowers ppk-GAL4 activity in C4da neurons thus prevents us from performing an experiment for Wnd::GFP without DLKi. In this condition, Wnd::GFP expression is barely detectable in C4da neurons.

      (5) In Figure 2 - Supplement 3 the authors state that their data suggests that Wnd protein palmitoylation is catalyzed by HIP14 due to colocalization in the somatic Golgi and mutating HIP14 leads to less Wnd in the axon terminal. This statement would be better supported by evaluating Wnd's palmitoylation via immunoprecipitation in response to dHIP14 enzyme activity.

      We appreciate reviewer’s comment. Although the exact identity of Wnd palmitoyltransferase might be of high interest, our study rather concerns about the biological role of Wnd axonal localization. Moreover, the identity of DLK palmitoyltransferase has been identified in mammalian cell culture and worm studies (Niu et al. 2020 “Coupled Control of Distal Axon Integrity and Somal Responses to Axonal Damage by the Palmitoyl Acyltransferase ZDHHC17”). ZDHHC17 is another name for HIP14. Our data together with these published works strongly suggest that Wnd, the Drosophila DLK might also be targeted by Drosophila HIP14 or dHIP14.

      (6) The authors argue that palmitoylation of Wnd is essential for axonal localization of Wnd. If dHIP14 indeed palmitoylates Wnd as the authors claim, shouldn't there be a decrease in Wnd's palmitoylation within dHIP14 mutants, consequently resulting in its accumulation in the cell body rather than localization in the axonal terminal? However, Wnd is reduced at the axon terminal in dHip14 mutants, but it does not appear to increase in the cell body (Figure 2S3.C). This observation contradicts the results showing increased Wnd in the cell body presented in Figure 2. B and E. This discrepancy should be addressed.

      Thank you for your comment. Our study concerns about the biological role of Wnd axonal localization. Although in an ideal model, dHIP14 mutations should prevent Wnd palmitoylation and causes subsequent cell body accumulation. However, it is highly likely that dHIP14 mutations affect a large number of protein palmitoylations – not just Wnd, which likely changes many aspect of cell functions. We envision that Wnd protein expression might be indirectly affected by these changes. In this context, mutating C130 in Wnd can be considered as more targeted approach – and our data clearly shows that such Wnd mutations render Wnd accumulation in cell bodies.

      (7) Figure 3 - the authors show increased Wnd protein by Western blot in WndC130S:GFP compared to Wnd::GFP. qPCR experiments to show similar mRNA expression of these two transgenes would be an important control, if it's thought that the increase of protein is due to reduction of protein degradation.

      Thank you for your comment. Expressing WndC130S::GFP vs Wnd::GFP was done by GAL4-UAS system – not through endogenous wnd promoter. Thus, we do not expect different mRNA abundance of WndC130S::GFP and Wnd::GFP. However, your concern is valid for Rab11 mutants. We measured wnd mRNA abundance by RT-qPCR and found that Rab11 mutations did not increase wnd mRNA levels (Figure 6 - Supplement 2). Rather, we observed consistent reduction in wnd mRNA levels by Rab11 mutant. Please note that total Wnd protein levels were significantly increased by Rab11 mutations. We currently do not have a clear explanation. We envision that the dramatic increase in Wnd signaling (ie, JNK signal, Figure 7A) induces a negative-feedback to reduce wnd mRNA levels (line 313-317).

      (8) Figure 4 Supplement - the authors report that Wnd::GFP causes robust induction of Puc-LacZ. A control without Wnd::GFP expression would be necessary to support that there was an induction.

      We have added control data of UAS-Wnd-KD::GFP (new Figure 4 supplement 1A). Since this required a new side-by-side comparison of fluorescent intensities, we re-performed the full set of experiments and replaced our old data sets.  The results confirmed that both Wnd::GFP and Wnd-C130S::GFP induces puc-lacZ expression. 

      (9) Previously it was shown that inhibiting palmitoylation of DLK prevented activation of JNK signaling (Holland et al 2015), but the authors show in Figure 4A instead an increase of JNK signaling. This discrepancy should be addressed.

      The use of Wnd palmitoylation-defective mutant in our study was only possible because of different behavior of Wnd-C130S from those of palmitoylation-defective DLK. Unlike diffuse cytoplasmic localization of the palmitoylation-defective DLK in mammalian cells or in C elegans neurons, Wnd-C130S exhibited clear puncta localization in neuronal cell bodies – which extensively co-localizes with somatic Golgi complex (Figure 2E and Figure 2 supplement 1). Tortosa et al (2022) showed that palmitoylation-defective DLK (DLK-CS) can trigger DLK signaling when artificially targeted to intracellular membranous organelles (Tortosa 2022, Figure 2 (K-N)). Thus, we reasoned that unlike the palmitoylation-defective DLK from mammalian and worms, Drosophila DLK, Wnd might be catalytically active when mutated on Cysteine 130 because of its puncta localization.

      (10) Figure 6 Supplement - the Rab11 staining is not in a pattern that would be expected with endosomes. A control of just YFP would be useful to determine if this fluorescence signal is specific to Rab11. Can endogenous Rab11 be detected in axons or in the axonal terminal?

      In our model system, endogenously tagged Rab11 (TI-Rab11) does not show clear puncta patterns in segmental nerves (axons) and neuropils (axon terminals), neither colocalize with Wnd-KD. This is indeed related to the reviewer’s comment #2, which suggests that Rab11 does not form endosomes in distal axons or axon terminals in mature neurons. Expressing Rab11 transgenes exhibited some puncta structures in axons (segmental nerves) (new Figure 6 supplement 1). However, they did not show meaningful colocalize with Wnd-KD. These are consistent with our model that Rab11 acts in neuronal cell bodies for Wnd axonal transport – likely via a sorting process.

      (11) There is growing evidence that palmitoylation is important for cargo sorting in the Golgi, and Rab11 is also located at the Golgi and important for trafficking from the Golgi. A mechanism that could be considered from your data is that blocking palmitoylation impairs sorting at the Golgi and trafficking from the Golgi, as opposed to impairing fast axonal transport. Indeed, Rab11 has been shown to be blocked from axons in mature neurons, making Rab11 unlikely to be responsible for the fast axonal transport of Wnd. Direct evidence of Rab11 transporting Wnd in axons would be necessary for the claim that Rab11 constantly transports DLK to terminals.

      We apologize for the confusion. We did not intend to suggest that Rab11 directly transports Wnd along the axons. We suggested that Rab11 is necessary for axonal localization of Wnd by acting at the somatic recycling endosomes since Rab11 and Wnd extensively colocalize in the cell body but not in the axon terminals (Figure 6 and Figure 6 supplement 1). In our new “Figure 6 supplement 1”, we have now added Rab11 and Wnd colocalization in axons (segmental nerves). We also revised the text (line 296-298) “On the other hand, we did not detect any meaningful colocalization between YFP::Rab11 and Wnd-KD::mRFP in C4da axon terminals or in axons (Manders’ coefficient 0.34 ± 0.14 and 0.41 ± 0.10 respectively) (Figure 6 – supplement 1). These suggest that Rab11 is involved in Wnd protein sorting at the somatic REs rather than transporting Wnd directly.” And in Discussion (line 394-398) “These further suggest that Rab11 is not directly involved in the anterograde long-distance transport of Wnd proteins, rather is responsible for sorting Wnd into the axonal anterograde transporting vesicles.”.

    2. eLife assessment

      This important manuscript shows that axonal transport of Wnd is required for its normal degradation by the Hiw ubiquitin ligase pathway. In Hiw mutants, the Wnd protein accumulates in nerve terminals. In the absence of axonal transport, Wnd levels also rise and lead to excessive JNK signaling, disrupting neuronal function. These are interesting findings supported by convincing data. However, how Rab11 is involved in Golgi processing or axonal transport of Wnd is not resolved as it is clear that Rab11 is not travelling with Wnd to the axon.

    3. Reviewer #1 (Public Review):

      Summary:

      The manuscript by Kim et al. describes a role for axonal transport of Wnd (a dual leucine zipper kinase) for its normal degradation by the Hiw ubiquitin ligase pathway. In Hiw mutants, the Wnd protein accumulates dramatically in nerve terminals compared to the cell body of neurons. In the absence of axonal transport, Wnd levels rise and lead to excessive JNK signaling that makes neurons unhappy.

      Strengths:

      Using GFP-tagged Wnd transgenes and structure-function approaches, the authors show that palmitoylation of the protein at C130 plays a role in this process by promoting golgi trafficking and axonal localization of the protein. In the absence of this transport, Wnd is not degraded by Hiw. The authors also identify a role for Rab11 in the transport of Wnd, and provide some evidence that Rab11 loss-of-function neuronal degenerative phenotypes are due to excessive Wnd signaling. Overall, the paper provides convincing evidence for a preferential site of action for Wnd degradation by the Hiw pathway within axonal and/or synaptic compartments of the neuron. In the absence of Wnd transport and degradation, the JNK pathway becomes hyperactivated. As such, the manuscript provides important new insights into compartmental roles for Hiw-mediated Wnd degradation and JNK signaling control.

      Weaknesses:

      It is unclear if the requirement for Wnd degradation at axonal terminals is due to restricted localization of HIW there, but it seems other data in the field argues against that model. The mechanistic link between Hiw degradation and compartmentalization is unknown.

    4. Reviewer #2 (Public Review):

      Summary:

      Utilizing transgene expression of Wnd in sensory neurons in Drosophila, the authors found that Wnd is enriched in axonal terminals. This enrichment could be blocked by preventing palmitoylation or inhibiting Rab1 or Rab11 activity. Indeed, subsequent experiments showed that inhibiting Wnd can prevent toxicity by Rab11 loss of function.

      Strengths:

      This paper evaluates in detail Wnd location in sensory neurons, and identifies a novel genetic interaction between Rab11 and Wnd that affects Wnd cellular distribution.

      Weaknesses:

      The authors report low endogenous expression of wnd, and expressing mutant hiw or overexpressing wnd is necessary to see axonal terminal enrichment. It is unclear if this overexpression model (which is known to promote synaptic overgrowth) would be relevant to normal physiology.

      Palmitoylation of the Wnd orthologue DLK in sensory neurons has previously been identified as important for DLK trafficking in a cell culture model.

    1. eLife assessment

      This study presents valuable findings from an observational dataset in a riverine ecosystem about the effects of genetic and species diversity, across multiple trophic levels, on ecosystem functions. However, the support for these findings is currently incomplete because raw data are not provided and there is insufficient information in the manuscript for readers to understand and assess the statistical analyses and conclusions. The work will be of broad interest to ecologists.

    2. Reviewer #1 (Public review):

      Summary:

      This work used a comprehensive dataset to compare the effects of species diversity and genetic diversity within each trophic level and across three trophic levels. The results showed that species diversity had negative effects on ecosystem functions, while genetic diversity had positive effects. These effects were observed only within each trophic level and not across the three trophic levels studied. Although the effects of biodiversity, especially genetic diversity across multi-trophic levels, have been shown to be important, there are still very few empirical studies on this topic due to the complex relationships and difficulty in obtaining data. This study collected an excellent dataset to address this question, enhancing our understanding of genetic diversity effects in aquatic ecosystems.

      Strengths:

      The study collected an extensive dataset that includes species diversity of primary producers (riparian trees), primary consumers (macroinvertebrate shredders), and secondary consumers (fish). It also includes the genetic diversity of the dominant species at each trophic level, biomass production, decomposition rates, and environmental data.

      The conclusions of this paper are mostly well supported by the data and the writing is logical and easy to follow.

      Weaknesses:

      While the dataset is impressive, the authors conducted analyses more akin to a "meta-analysis," leaving out important basic information about the raw data in the manuscript. Given the complexity of the relationships between different trophic levels and ecosystem functions, it would be beneficial for the authors to show the results of each SEM (structural equation model).

      The main results presented in the manuscript are derived from a "metadata" analysis of effect sizes. However, the methods used to obtain these effect sizes are not sufficiently clarified. By analyzing the effect sizes of species diversity and genetic diversity on these ecosystem functions, the results showed that species diversity had negative effects, while genetic diversity had positive effects on ecosystem functions. The negative effects of species diversity contradict many studies conducted in biodiversity experiments. The authors argue that their study is more relevant because it is based on a natural system, which is closer to reality, but they also acknowledge that natural systems make it harder to detect underlying mechanisms. Providing more results based on the raw data and offering more explanations of the possible mechanisms in the introduction and discussion might help readers understand why and in what context species diversity could have negative effects.

      Environmental variation was included in the analyses to test if the environment would modulate the effects of biodiversity on ecosystem functions. However, the main results and conclusions did not sufficiently address this aspect.

    3. Reviewer #2 (Public review):

      Summary:

      Fargeot et al. investigated the relative importance of genetic and species diversity on ecosystem function and examined whether this relationship varies within or between trophic-level responses. To do so, they conducted a well-designed field survey measuring species diversity at 3 trophic levels (primary producers [trees], primary consumers [macroinvertebrate shredders], and secondary consumers [fishes]), genetic diversity in a dominant species within each of these 3 trophic levels and 7 ecosystem functions across 52 riverine sites in southern France. They show that the effect of genetic and species diversity on ecosystem functions are similar in magnitude, but when examining within-trophic level responses, operate in different directions: genetic diversity having a positive effect and species diversity a negative one. This data adds to growing evidence from manipulated experiments that both species and genetic diversity can impact ecosystem function and builds upon this by showing these effects can be observed in nature.

      Strengths:

      The study design has resulted in a robust dataset to ask questions about the relative importance of genetic and species diversity of ecosystem function across and within trophic levels.

      Overall, their data supports their conclusions - at least within the system that they are studying - but as mentioned below, it is unclear from this study how general these conclusions would be.

      Weaknesses:

      (1) While a robust dataset, the authors only show the data output from the SEM (i.e., effect size for each individual diversity type per trophic level (6) on each ecosystem function (7)), instead of showing much of the individual data. Although the summary SEM results are interesting and informative, I find that a weakness of this approach is that it is unclear how environmental factors (which were included but not discussed in the results) nor levels of diversity were correlated across sites. As species and genetic diversity are often correlated but also can have reciprocal feedbacks on each other (e.g., Vellend 2005), there may be constraints that underpin why the authors observed positive effects of one type of diversity (genetic) when negative effects of the other (species). It may have also been informative to run SEM with links between levels of diversity. By focusing only on the summary of SEM data, the authors may be reducing the strength of their field dataset and ability to draw inferences from multiple questions and understand specific study-system responses.

      (2) My understanding of SEM is it gives outputs of the strength/significance of each pathway/relationship and if so, it isn't clear why this wasn't used and instead, confidence intervals of Z scores to determine which individual BEFs were significant. In addition, an inclusion of the 7 SEM pathway outputs would have been useful to include in an appendix.

      (3) I don't fully agree with the authors calling this a meta-analysis as it is this a single study of multiple sites within a single region and a specific time point, and not a collection of multiple studies or ecosystems conducted by multiple authors. Moreso, the authors are using meta-analysis summary metrics to evaluate their data. The authors tend to focus on these patterns as general trends, but as the data is all from this riverine system this study could have benefited from focusing on what was going on in this system to underpin these patterns. I'd argue more data is needed to know whether across sites and ecosystems, species diversity and genetic diversity have opposite effects on ecosystem function within trophic levels.

    4. Reviewer #3 (Public review):

      The manuscript by Fargeot and colleagues assesses the relative effects of species and genetic diversity on ecosystem functioning. This study is very well written and examines the interesting question of whether within-species or among-species diversity correlates with ecosystem functioning, and whether these effects are consistent across trophic levels. The main findings are that genetic diversity appears to have a stronger positive effect on function than species diversity (which appears negative). These results are interesting and have value.

      However, I do have some concerns that could influence the interpretation.

      (1) Scale: the different measures of diversity and function for the different trophic levels are measured over very different spatial scales, for example, trees along 200 m transects and 15 cm traps. It is not clear whether trees 200 m away are having an effect on small-scale function.

      (2) Size of diversity gradients: More information is needed on the actual diversity gradients. One of the issues with surveys of natural systems is that they are of species that have already gone through selection filters from a regional pool, and theoretically, if the environments are similar, you should get similar sets of species, without monocultures. So, if the species diversity gradients range from say, 6 to 8 species, but genetic diversity gradients span an order of magnitude more, you can explain much more variance with genetic diversity. Related to this, species diversity effects on function are often asymptotic at high diversity and so if you are only sampling at the high diversity range, we should expect a strong effect.

      (3) Ecosystem functions: The functions are largely biomass estimates (expect decomposition), and I fail to see how the biomass of a single species can be construed as an ecosystem function. Aren't you just estimating a selection effect in this case?

      Note that the article claims to be one of the only studies to look at function across trophic levels, but there are several others out there, for example:

      Li, F., Altermatt, F., Yang, J., An, S., Li, A., & Zhang, X. (2020). Human activities' fingerprint on multitrophic biodiversity and ecosystem functions across a major river catchment in China. Global change biology, 26(12), 6867-6879.

      Luo, Y. H., Cadotte, M. W., Liu, J., Burgess, K. S., Tan, S. L., Ye, L. J., ... & Gao, L. M. (2022). Multitrophic diversity and biotic associations influence subalpine forest ecosystem multifunctionality. Ecology, 103(9), e3745.

      Moi, D. A., Romero, G. Q., Antiqueira, P. A., Mormul, R. P., Teixeira de Mello, F., & Bonecker, C. C. (2021). Multitrophic richness enhances ecosystem multifunctionality of tropical shallow lakes. Functional Ecology, 35(4), 942-954.

      Wan, B., Liu, T., Gong, X., Zhang, Y., Li, C., Chen, X., ... & Liu, M. (2022). Energy flux across multitrophic levels drives ecosystem multifunctionality: Evidence from nematode food webs. Soil Biology and Biochemistry, 169, 108656.

      And the case was made strongly by:

      Seibold, S., Cadotte, M. W., MacIvor, J. S., Thorn, S., & Müller, J. (2018). The necessity of multitrophic approaches in community ecology. Trends in ecology & evolution, 33(10), 754-764.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work used a comprehensive dataset to compare the effects of species diversity and genetic diversity within each trophic level and across three trophic levels. The results showed that species diversity had negative effects on ecosystem functions, while genetic diversity had positive effects. These effects were observed only within each trophic level and not across the three trophic levels studied. Although the effects of biodiversity, especially genetic diversity across multi-trophic levels, have been shown to be important, there are still very few empirical studies on this topic due to the complex relationships and difficulty in obtaining data. This study collected an excellent dataset to address this question, enhancing our understanding of genetic diversity effects in aquatic ecosystems.

      Strengths:

      The study collected an extensive dataset that includes species diversity of primary producers (riparian trees), primary consumers (macroinvertebrate shredders), and secondary consumers (fish). It also includes the genetic diversity of the dominant species at each trophic level, biomass production, decomposition rates, and environmental data.

      The conclusions of this paper are mostly well supported by the data and the writing is logical and easy to follow.

      Weaknesses:

      While the dataset is impressive, the authors conducted analyses more akin to a "meta-analysis," leaving out important basic information about the raw data in the manuscript. Given the complexity of the relationships between different trophic levels and ecosystem functions, it would be beneficial for the authors to show the results of each SEM (structural equation model).

      We understand the point raised by the reviewer. Our objective was to focus the Results section on the main hypotheses, and for this we let away the raw statistics. We can definitively show the seven individual SEM, highlighting the major links, which may help understand some processes. This will be done in the next version of the manuscript.

      The main results presented in the manuscript are derived from a "metadata" analysis of effect sizes. However, the methods used to obtain these effect sizes are not sufficiently clarified. By analyzing the effect sizes of species diversity and genetic diversity on these ecosystem functions, the results showed that species diversity had negative effects, while genetic diversity had positive effects on ecosystem functions. The negative effects of species diversity contradict many studies conducted in biodiversity experiments. The authors argue that their study is more relevant because it is based on a natural system, which is closer to reality, but they also acknowledge that natural systems make it harder to detect underlying mechanisms. Providing more results based on the raw data and offering more explanations of the possible mechanisms in the introduction and discussion might help readers understand why and in what context species diversity could have negative effects.

      We hope you will be right. As said above, we will explore this possibility.

      Environmental variation was included in the analyses to test if the environment would modulate the effects of biodiversity on ecosystem functions. However, the main results and conclusions did not sufficiently address this aspect.

      This will be addressed by the more in-depth analysis of individual SEM, and we will discuss this further.

      Reviewer #2 (Public review):

      Summary:

      Fargeot et al. investigated the relative importance of genetic and species diversity on ecosystem function and examined whether this relationship varies within or between trophic-level responses. To do so, they conducted a well-designed field survey measuring species diversity at 3 trophic levels (primary producers [trees], primary consumers [macroinvertebrate shredders], and secondary consumers [fishes]), genetic diversity in a dominant species within each of these 3 trophic levels and 7 ecosystem functions across 52 riverine sites in southern France. They show that the effect of genetic and species diversity on ecosystem functions are similar in magnitude, but when examining within-trophic level responses, operate in different directions: genetic diversity having a positive effect and species diversity a negative one. This data adds to growing evidence from manipulated experiments that both species and genetic diversity can impact ecosystem function and builds upon this by showing these effects can be observed in nature.

      Strengths:

      The study design has resulted in a robust dataset to ask questions about the relative importance of genetic and species diversity of ecosystem function across and within trophic levels.

      Overall, their data supports their conclusions - at least within the system that they are studying - but as mentioned below, it is unclear from this study how general these conclusions would be.

      Weaknesses:

      (1) While a robust dataset, the authors only show the data output from the SEM (i.e., effect size for each individual diversity type per trophic level (6) on each ecosystem function (7)), instead of showing much of the individual data. Although the summary SEM results are interesting and informative, I find that a weakness of this approach is that it is unclear how environmental factors (which were included but not discussed in the results) nor levels of diversity were correlated across sites. As species and genetic diversity are often correlated but also can have reciprocal feedbacks on each other (e.g., Vellend 2005), there may be constraints that underpin why the authors observed positive effects of one type of diversity (genetic) when negative effects of the other (species). It may have also been informative to run SEM with links between levels of diversity. By focusing only on the summary of SEM data, the authors may be reducing the strength of their field dataset and ability to draw inferences from multiple questions and understand specific study-system responses.

      We will address this issue by performing a more in-depth analysis of each individual SEMs, and provide directly these raw data. Regarding the comment on species-genomic diversity correlations (SGDCs), we would like to point out that this has already been addressed in a previous paper (Fargeot et al. Oikos, 2023). There is actually no correlations between genomic and species diversity in these dataset, which is merely explain by the selection of the sampling sites. The relationships between species diversity, genomic diversity and environmental factors are also detailed in Fargeot et al. (2023). We precisely published this paper first to focus here “only” on BEFs. But we realize we need to provide further information and discuss further these issues. This will be done in the next version of the manuscript.

      (2) My understanding of SEM is it gives outputs of the strength/significance of each pathway/relationship and if so, it isn't clear why this wasn't used and instead, confidence intervals of Z scores to determine which individual BEFs were significant. In addition, an inclusion of the 7 SEM pathway outputs would have been useful to include in an appendix.

      Yes, we can provide p-values. Results from p-values will provide the same information than 95%Cis, both yield very similar (if not exactly the same) results/conclusions. We wil provide the 7 SEMs in Appendices.

      (3) I don't fully agree with the authors calling this a meta-analysis as it is this a single study of multiple sites within a single region and a specific time point, and not a collection of multiple studies or ecosystems conducted by multiple authors. Moreso, the authors are using meta-analysis summary metrics to evaluate their data. The authors tend to focus on these patterns as general trends, but as the data is all from this riverine system this study could have benefited from focusing on what was going on in this system to underpin these patterns. I'd argue more data is needed to know whether across sites and ecosystems, species diversity and genetic diversity have opposite effects on ecosystem function within trophic levels.

      We agree. “Meta-regression” would perhaps be more adequate than “meta-analyses”. As said above, more details will be provided on the next version of the manuscript.

      Reviewer #3 (Public review):

      The manuscript by Fargeot and colleagues assesses the relative effects of species and genetic diversity on ecosystem functioning. This study is very well written and examines the interesting question of whether within-species or among-species diversity correlates with ecosystem functioning, and whether these effects are consistent across trophic levels. The main findings are that genetic diversity appears to have a stronger positive effect on function than species diversity (which appears negative). These results are interesting and have value.

      However, I do have some concerns that could influence the interpretation.

      (1) Scale: the different measures of diversity and function for the different trophic levels are measured over very different spatial scales, for example, trees along 200 m transects and 15 cm traps. It is not clear whether trees 200 m away are having an effect on small-scale function.

      Trees identification and invertebrate (and fish) sampling are done on the same scale. Trees are spread along the river so that their leaves fall directly in the river. Traps have been installed all along the same transect in various micro-habitats. Diversity have been measured at the exact same scale for all organisms. We will try to be more precise.

      (2) Size of diversity gradients: More information is needed on the actual diversity gradients. One of the issues with surveys of natural systems is that they are of species that have already gone through selection filters from a regional pool, and theoretically, if the environments are similar, you should get similar sets of species, without monocultures. So, if the species diversity gradients range from say, 6 to 8 species, but genetic diversity gradients span an order of magnitude more, you can explain much more variance with genetic diversity. Related to this, species diversity effects on function are often asymptotic at high diversity and so if you are only sampling at the high diversity range, we should expect a strong effect.

      We will provide more information. The range of diversity also vary according to the trophic level; there are more invertebrate species than fish species. But overall the rage of species number is large.

      (3) Ecosystem functions: The functions are largely biomass estimates (expect decomposition), and I fail to see how the biomass of a single species can be construed as an ecosystem function. Aren't you just estimating a selection effect in this case?

      The biomass estimated for a certain area represent an estimate of productivity, whatever the number of species being considered. Obviously, productivity of a species can be due to environmental constraints; the biomass is expected to be lower at the niche margin (selection effect). But is these environmental effects are taken into account (which is the case in the SEMs), then the residual variation can be explained by biodiversity effects. We will try to make it more clear.

      Note that the article claims to be one of the only studies to look at function across trophic levels, but there are several others out there, for example:

      Thanks, we will cite some of these studies (and make our claim less strong)

      Li, F., Altermatt, F., Yang, J., An, S., Li, A., & Zhang, X. (2020). Human activities' fingerprint on multitrophic biodiversity and ecosystem functions across a major river catchment in China. Global change biology, 26(12), 6867-6879.

      Luo, Y. H., Cadotte, M. W., Liu, J., Burgess, K. S., Tan, S. L., Ye, L. J., ... & Gao, L. M. (2022). Multitrophic diversity and biotic associations influence subalpine forest ecosystem multifunctionality. Ecology, 103(9), e3745.

      Moi, D. A., Romero, G. Q., Antiqueira, P. A., Mormul, R. P., Teixeira de Mello, F., & Bonecker, C. C. (2021). Multitrophic richness enhances ecosystem multifunctionality of tropical shallow lakes. Functional Ecology, 35(4), 942-954.

      Wan, B., Liu, T., Gong, X., Zhang, Y., Li, C., Chen, X., ... & Liu, M. (2022). Energy flux across multitrophic levels drives ecosystem multifunctionality: Evidence from nematode food webs. Soil Biology and Biochemistry, 169, 108656.

      And the case was made strongly by:

      Seibold, S., Cadotte, M. W., MacIvor, J. S., Thorn, S., & Müller, J. (2018). The necessity of multitrophic approaches in community ecology. Trends in ecology & evolution, 33(10), 754-764.

    1. eLife assessment

      This study provides direct evidence showing that Kv1.8 channels provide the basis for several potassium currents in the two types of sensory hair cells found in the mouse vestibular system. This is an important finding because the nature of the channels underpinning the unusual potassium conductance gK,L in type I hair cells has been under scrutiny for many years. The experimental evidence is compelling and the analysis is rigorous. The study will be of interest to cell and molecular biologists as well as vestibular and auditory neuroscientists.

    2. Reviewer #1 (Public Review):

      Summary:

      In this paper the authors provide a thorough demonstration of the role that one particular type of voltage-gated potassium channel, Kv1.8, plays in a low voltage activated conductance found in type I vestibular hair cells. Along the way, they find that this same channel protein appears to function in type II vestibular hair cells as well, contributing to other macroscopic conductances. Overall, Kv1.8 may provide especially low input resistance and short time constants to facilitate encoding of more rapid head movements in animals that have necks. Combination with other channel proteins, in different ratios, may contribute to the diversified excitability of vestibular hair cells.

      Strengths:

      The experiments are comprehensive and clearly described, both in text and in the figures. Statistical analyses are provided throughout.

      Weaknesses:

      None.

    3. Reviewer #2 (Public Review):

      The focus of this manuscript was to investigate whether Kv1.8 channels, which have previously been suggested to be expressed in type I hair cells of the mammalian vestibular system, are responsible for the potassium conductance gK,L. This is an important study because gK,L is known to be crucial for the function of type I hair cells, but the channel identity has been a matter of debate for the past 20 years. The authors have addressed this research topic by primarily investigating the electrophysiological properties of the vestibular hair cells from Kv1.8 knockout mice. Interestingly, gK,L was completely abolished in Kv1.8-deficient mice, in agreement with the hypothesis put forward by the authors based on the literature. The surprising observation was that in the absence of Kv1.8 potassium channels, the outward potassium current in type II hair cells was also largely reduced. Type II hair cells express the largely inactivating potassium conductance g,K,A, but not gK,L. The authors concluded that heteromultimerization of non-inactivating Kv1.8 and the inactivating Kv1.4 subunits could be responsible for the inactivating gK,A. Overall, the manuscript is very well written and most of the conclusions are supported by the experimental work. The figures are well described, and the statistical analysis is robust.

    4. Reviewer #3 (Public Review):

      Summary:

      This paper by Martin et al. describes the contribution of a Kv channel subunit (Kv1.8, KCNA10) to voltage-dependent K+ conductances and membrane properties of type I and type II hair cells of the mouse utricle. Previous work has documented striking differences in K+ conductances between vestibular hair cell types. In particular amniote type I hair cells are known to express a non-typical low-voltage-activated K+ conductance (GK,L) whose molecular identity has been elusive. K+ conductances in hair cells from 3 different mouse genotypes (wildtype, Kv1.8 homozygous knockouts and heterozygotes) are examined here and whole cell patch-clamp recordings indicate a prominent role for Kv1.8 subunits in generating GK,L. Results also interestingly support a role for Kv1.8 subunits in type II hair cell K+ conductances; inactivating conductances in null mice are reduced in type II hair cells from striola and extrastriola regions of the utricle. Kv1.8 is therefore proposed to contribute as a pore-forming subunit for 3 different K+ conductances in vestibular hair cells. The impact of these conductances on membrane responses to current steps is studied in current clamp. Pharmacological experiments use XE991 to block some residual Kv7-mediated current in both hair cell types, but no other pharmacological blockers are used. In addition immunostaining data are presented and raise some questions about Kv7 and Kv1.8 channel localization. Overall, the data present compelling evidence that removal of Kv1.8 produces profound changes in hair cell membrane conductances and sensory capabilities. These changes at hair cell level suggest vestibular function would be compromised and further assessment in terms of balance behavior in the different mice would be interesting.

      Strengths:

      This study provides strong evidence that Kv1.8 subunits are major contributors to the unusual K+ conductance in type I hair cells of the utricle. It also indicates that Kv1.8 subunits are important for type II hair cell K+ conductances because Kv1.8-/- mice lacked an inactivating A conductance and had reduced delayed rectifier conductance compared to controls. A comprehensive and careful analysis of biophysical profiles is presented of expressed K+ conductances in 3 different mouse genotypes. Voltage-dependent K+ currents are rigorously characterized at a range of different ages and their impact on membrane voltage responses to current input is studied. Some pharmacological experiments are performed in addition to immunostaining to bolster the conclusions from the biophysical studies. The paper has a significant impact in showing the role of Kv1.8 in determining utricular hair cell electrophysiological phenotypes.

      Weaknesses:

      (1) From previous work it is known that GK,L in type I hair cells has unusual ion permeation and pharmacological properties that differ greatly from type II hair cell conductances. Notably GK,L is highly permeable to Cs+ as well as K+ ions and is slightly permeable to Na+. It is blocked by 4-aminopyridine and divalent cations (Ba2+, Ca2+, Ni2+), enhanced by external K+ and modulated by cyclic GMP. The question arises-if Kv1.8 is a major player and pore-forming subunit in type I and type II cells (and cochlear inner hair cells as shown by Dierich et al. 2020) how are subunits modified to produce channels with very different properties? A role for Kv1.4 channels (gA) is proposed in type II hair cells based on previous findings in bird hair cells. However, hair cell specific partner interactions with Kv1.8 that result in GK, L in type I hair cells and Cs+ impermeable, inactivating currents in type II hair cells remain for the most part unexplored.

      (2) Data from patch-clamp and immunocytochemistry experiments are not in close alignment. XE991 (Kv7 channel blocker) decreases remaining K+ conductance in type I and type II hair cells from null mice supporting the presence of Kv7 channels in hair cells (Fig. 7). Also, Holt et al. (2007) previously showed inhibition of GK,L in type I hair cells (but not delayed rectifier conductance in type II hair cells) using a dominant negative construct of Kv7.4 channels. However, immunolabelling indicates Kv7.4 channels on the inner face of calyx terminals adjacent to hair cells (Fig. 5). Some reconciliation of these findings is needed.

      (3) A previous paper reported that a vestibular evoked potential was abnormal in Kv1.8-/- mice (Lee et al. 2013) as briefly mentioned (lines 94-95). It would be really interesting to know if any vestibular-associated behaviors and/or hearing loss were observed in the mice populations. If responses are compromised at the sensory hair cell level across different zones, degradation of balance function would be anticipated and should be elucidated.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Line 127. Provide a few more words describing the voltage protocol. To the uninitiated, panels A and B will be difficult to understand. "The large negative step is used to first close all channels, then probe the activation function with a series of depolarizing steps to re-open them and obtain the max conductance from the peak tail current at -36 mV. "

      We have revised the text as suggested (revision lines 127 to Line 131): “From a holding potential within the gK,L activation range (here –74 mV), the cell is hyperpolarized to –124 mV, negative to EK and the activation range, producing a large inward current through open gK,L channels that rapidly decays as the channels deactivate. We use the large transient inward current as a hallmark of gK,L. The hyperpolarization closes all channels, and then the activation function is probed with a series of depolarizing steps, obtaining the max conductance from the peak tail current at –44 mV (Fig. 1A).”

      Incidentally, why does the peak tail current decay? 

      We added this text to the figure legend to explain this: “For steps positive to the midpoint voltage, tail currents are very large. As a result, K+ accumulation in the calyceal cleft reduces driving force on K+, causing currents to decay rapidly, as seen in A (Lim et al., 2011).”

      The decay of the peak tail current is a feature of gK,L (large K+ conductance) and the large enclosed synaptic cleft (which concentrates K+ that effluxes from the HC). See Govindaraju et al. (2023) and Lim et al. (2011) for modeling and experiments around this phenomenon.

      Line 217-218. For some reason, I stumbled over this wording. Perhaps rearrange as "In type II HCs absence of Kv1.8 significantly increased Rin and tauRC. There was no effect on Vrest because the conductances to which Kv1.8 contributes, gA and gDR activate positive to the resting potential. (so which K conductances establish Vrest???). 

      We kept our original wording because we wanted to discuss the baseline (Vrest) before describing responses to current injection.

      ->Vrest is presumably maintained by ATP-dependent Na/K exchangers (ATP1a1), HCN, Kir, and mechanotransduction currents. Repolarization is achieved by delayed rectifier and A-type K+ conductances in type II HCs.

      Figure 4, panel C - provides absolute membrane potential for voltage responses. Presumably, these were the most 'ringy' responses. Were they obtained at similar Vm in all cells (i.e., comparisons of Q values in lines 229-230). 

      We added the absolute membrane potential scale. Type II HC protocols all started with 0 pA current injection at baseline, so they were at their natural Vrest, which did not differ by genotype or zone. Consistent with Q depending on expression of conductances that activate positive to Vrest, Q did not co-vary with Vrest (Pearson’s correlation coefficient = 0.08, p = 0.47, n= 85).

      Lines 254. Staining is non-specific? Rather than non-selective? 

      Yes, thanks - Corrected (Line 264).

      Figure 6. Do you have a negative control image for Kv1.4 immuno? Is it surprising that this label is all over the cell, but Kv1.8 is restricted to the synaptic pole? 

      We don’t have a null-animal control because this immunoreactivity was done in rat. While the cuticular plate staining was most likely nonspecific because we see that with many different antibodies, it’s harder to judge the background staining in the hair cell body layer. After feedback from the reviewers, we decided to pull the KV1.4 immunostaining from the paper because of the lack of null control, high background, and inability to reproduce these results in mouse tissue. In our hands, in mouse tissue, both mouse and rabbit anti-KV1.4 antibodies failed to localize to the hair cell membrane. Further optimization or another method could improve that, but for now the single-cell expression data (McInturff et al., 2018) remain the strongest evidence for KV1.4 expression in murine type II hair cells.

      Lines 400-404. Whew, this is pretty cryptic. Expand a bit? 

      We simplified this paragraph (revision lines 411-413): “We speculate that gA and gDR(KV1.8) have different subunit composition: gA may include heteromers of KV1.8 with other subunits that confer rapid inactivation, while gDR(KV1.8) may comprise homomeric KV1.8 channels, given that they do not have N-type inactivation .”

      Line 428. 'importantly different ion channels'. I think I understand what is meant but perhaps say a bit more. 

      Revised (Line 438): “biophysically distinct and functionally different ion channels”.

      Random thought. In addition to impacting Rin and TauRC, do you think the more negative Vrest might also provide a selective advantage by increasing the driving force on K entry from endolymph? 

      When the calyx is perfectly intact, gK,L is predicted to make Vrest less negative than the values we report in our paper, where we have disturbed the calyx to access the hair cell (–80, Govindaraju et al., 2023, vs. –87 mV, here). By enhancing K+ accumulation in the calyceal cleft, the intact calyx shifts EK—and Vrest—positively (Lim et al., 2011), so the effect on driving force may not be as drastic as what you are thinking.

      Reviewer #2 (Recommendations For The Authors): 

      (1) Introduction: wouldn't the small initial paragraph stating the main conclusion of the study fit better at the end of the background section, instead of at the beginning? 

      Thank you for this idea, we have tried that and settled on this direct approach to let people know in advance what the goals of the paper are.

      (2) Pg.4: The following sentence is rather confusing "Between P5 and P10, we detected no evidence of a non-gK,L KV1.8-dependent.....". Also, Suppl. Fig 1A seems to show that between P5 and P10 hair cells can display a potassium current having either a hyperpolarised or depolarised Vhalf. Thus, I am not sure I understand the above statement. 

      Thank you for pointing out unclear wording. We used the more common “delayed rectifier” term in our revision (Lines 144-147): “Between P5 and P10, some type I HCs have not yet acquired the physiologically defined conductance, gK,L.. N effects of KV1.8 deletion were seen in the delayed rectifier currents of immature type I HCs (Suppl. Fig. 1B), showing that they are not immature forms of the Kv1.8-dependent gK,L channels. ”

      (3) For the reduced Cm of hair cells from Kv1.8 knockout mice, could another reason be simply the immature state of the hair cells (i.e. lack of normal growth), rather than less channels in the membrane? 

      There were no other signs to suggest immaturity or abnormal growth in KV1.8–/– hair cells or mice. Importantly, type II HCs did not show the same Cm effect.

      We further discussed the capacitance effect in lines 160-167: “Cm scales with surface area, but soma sizes were unchanged by deletion of KV1.8 (Suppl. Table 2). Instead, Cm may be higher in KV1.8+/+ cells because of gK,L for two reasons. First, highly expressed trans-membrane proteins (see discussion of gK,L channel density in Chen and Eatock, 2000) can affect membrane thickness (Mitra et al., 2004), which is inversely proportional to specific Cm. Second, gK,L could contaminate estimations of capacitive current, which is calculated from the decay time constant of transient current evoked by small voltage steps outside the operating range of any ion channels. gK,L has such a negative operating range that, even for Vm negative to –90 mV, some gK,L channels are voltage-sensitive and could add to capacitive current.”

      (4) Methods: The electrophysiological part states that "For most recordings, we used .....". However, it is not clear what has been used for the other recordings.

      Thanks for catching this error, a holdover from an earlier ms. version.  We have deleted “For most recordings” (revision line 466).

      Also, please provide the sign for the calculated 4 mV liquid junction potential. 

      Done (revision line 476).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Some of the data in panels in Fig. 1 are hard to match up. The voltage protocols shown in A and B show steps from hyperpolarized values to -71mV (A) and -32 mV (B). However, the value from A doesn't seem to correspond with the activation curve in C.

      Thank you for catching this.  We accidentally showed the control I-X curve from a different cell than that in A. We now show the G-V relation for the cell in A.

      Also the Vhalf in D for -/- animals is ~-38 mV, which is similar to the most positive step shown in the protocol.

      The most positive step in Figure 1B is actually –25 mV. The uneven tick labels might have been confusing, so we re-labeled them to be more conventional.

      Were type I cells stepped to more positive potentials to test for the presence of voltage-activated currents at greater depolarizations? This is needed to support the statement on lines 147-148. 

      We added “no additional K+ conductance activated up to +40 mV” (revision line 149-150).  Our standard voltage-clamp protocol iterates up to ~+40 mV in KV1.8–/– hair cells, but in Figure 1 we only showed steps up to –25 mV because K+ accumulation in the synaptic cleft with the calyx distorts the current waveform even for the small residual conductances of the knockouts. KV1.8–/– hair cells have a main KV conductance with a Vhalf of ~–38 mV, as shown in Figure 1, and we did not see an additional KV conductance that activated with a more positive Vhalf up to +40 mV.

      (2) Line 151 states "While the cells of Kv1.8-/- appeared healthy..." how were epithelia assessed for health? Hair cells arise from support cells and it would be interesting to know if Kv1.8 absence influences supporting cells or neurons. 

      We added our criteria for cell health to lines 477-479: “KV1.8–/– hair cells appeared healthy in that cells had resting potentials negative to –50 mV, cells lasted a long time (20-30 minutes) in ruptured patch recordings, membranes were not fragile, and extensive blebbing was not seen.”

      Supporting cells were not routinely investigated. We characterized calyx electrical activity (passive membrane properties, voltage-gated currents, firing pattern) and didn’t detect differences between +/+, +/–, and –/– recordings (data not shown). KV1.8 was not detected in neural tissue (Lee et al., 2013). 

      (3) Several different K+ channel subtypes were found to contribute to inner hair cell K+ conductances (Dierich et al. 2020) but few additional K+ channel subtypes are considered here in vestibular hair cells. Further comments on calcium-activated conductances (lines 310-317) would be helpful since apamin-sensitive SK conductances are reported in type II hair cells (Poppi et al. 2018) and large iberiotoxin-sensitive BK conductances in type I hair cells (Contini et al. 2020). Were iberiotoxin effects studied at a range of voltages and might calcium-dependent conductances contribute to the enhanced resonance responses shown in Fig. 4? 

      We refer you to lines 310-317 in the original ms (lines 322-329 in the revised ms), where we explain possible reasons for not observing IK(Ca) in this study.

      (4) Similar to GK,L erg (Kv11) channels show significant Cs+-permeability. Were experiments using Cs+ and/or Kv11 antagonists performed to test for Kv11? 

      No. Hurley et al. (2006) used Kv11 antagonists to reveal Kv11 currents in rat utricular type I hair cells with perforated patch, which were also detected in rats with single-cell RT-PCR (Hurley et al. 2006) and in mice with single-cell RNAseq (McInturff et al., 2018).  They likely contribute to hair cell currents, alongside Kv7, Kv1.8, HCN1, and Kir. 

      (5) Mechanosensitive ("MET") channels in hair cells are mentioned on lines 234 and 472 (towards the end of the Discussion), but a sentence or two describing the sensory function of hair cells in terms of MET channels and K+ fluxes would help in the Introduction too. 

      Following this suggestion we have expanded the introduction with the following lines  (78-87): “Hair cells are known for their large outwardly rectifying K+ conductances, which repolarize membrane voltage following a mechanically evoked perturbation and in some cases contribute to sharp electrical tuning of the hair cell membrane.  Because gK,L is unusually large and unusually negatively activated, it strongly attenuates and speeds up the receptor potentials of type I HCs (Correia et al., 1996; Rüsch and Eatock, 1996b). In addition, gK,L augments a novel non-quantal transmission from type I hair cell to afferent calyx by providing open channels for K+ flow into the synaptic cleft (Contini et al., 2012, 2017, 2020; Govindaraju et al., 2023), increasing the speed and linearity of the transmitted signal (Songer and Eatock, 2013).”

      (6) Lines 258-260 state that GKL does not inactivate, but previous literature has documented a slow type of inactivation in mouse crista and utricle type I hair cells (Lim et al. 2011, Rusch and Eatock 1996) which should be considered. 

      Lim et al. (2011) concluded that K+ accumulation in the synaptic cleft can explain much of the apparent inactivation of gK,L. In our paper, we were referring to fast, N-type inactivation. We changed that line to be more specific; new revision lines 269-271: “KV1.8, like most KV1 subunits, does not show fast inactivation as a heterologously expressed homomer (Lang et al., 2000; Ranjan et al., 2019; Dierich et al., 2020), nor do the KV1.8-dependent channels in type I HCs, as we show, and in cochlear inner hair cells (Dierich et al., 2020).”

      (7) Lines 320-321 Zonal differences in inward rectifier conductances were reported previously in bird hair cells (Masetto and Correia 1997) and should be referenced here.

      Zonal differences were reported by Masetto and Correia for type II but not type I avian hair cells, which is why we emphasize that we found a zonal difference in I-H in type I hair cells. We added two citations to direct readers to type II hair cell results (lines 333-334): “The gK,L knockout allowed identification of zonal differences in IH and IKir in type I HCs, previously examined in type II HCs (Masetto and Correia, 1997; Levin and Holt, 2012).”

      Also, Horwitz et al. (2011) showed HCN channels in utricles are needed for normal balance function, so please include this reference (see line 171). 

      Done (line 184).

      (8) Fig 6A. Shows Kv1.4 staining in rat utricle but procedures for rat experiments are not described. These should be added. Also, indicate striola or extrastriola regions (if known). 

      We removed KV1.4 immunostaining from the paper, see above.

      (9) Table 6, ZD7288 is listed -was this reagent used in experiments to block Gh? If not please omit. 

      ZD7288 was used to block gH to produce a clean h-infinity curve in Figure 6, which is described in the legend.

      (10) In supplementary Fig. 5A make clear if the currents are from XE991 subtraction. Also, is the G-V data for single cell or multiple cells in B? It appears to be from 1 cell but ages P11-505 are given in legend. 

      The G-V curve in B is from XE991 subtraction, and average parameters in the figure caption are for all the KV1.8–/–  striolar type I hair cells where we observed this double Boltzmann tail G-V curve. I added detail to the figure caption to explain this better.

      (11) Supplementary Fig. 6A claims a fast activation of inward rectifier K+ channels in type II but not type I cells-not clear what exactly is measured here.

      We use “fast inward rectifier” to indicate the inward current that increases within the first 20 ms after hyperpolarization from rest (IKir, characterized in Levin & Holt, 2012) in contrast to HCN channels, which open over ~100 ms. We added panel C to show that the activation of IKir is visible in type II hair cells but not in the knockout type I hair cells that lack gK,L. IKir was a reliable cue to distinguish type I and type II hair cells in the knockout.

      For our actual measurements in Fig 6B, we quantified the current flowing after 250 ms at –124 mV because we did not pharmacologically separate IKir and IH.

      Could the XE991-sensitive current be activated and contributing?

      The XE991-sensitive current could decay (rapidly) at the onset of the hyperpolarizing step, but was not contributing to our measurement of IKir­ and IH, made after 250 ms at –124 mV, at which point any low-voltage-activated (LVA) outward rectifiers have deactivated. Additionally, the LVA XE991-sensitive currents were rare (only detected in some striolar type I hair cells) and when present did not compete with fast IKir, which is only found in type II hair cells.

      Also, did the inward rectifier conductances sustain any outward conductance at more depolarized voltage steps? 

      For the KV1.8-null mice specifically, we cannot answer the question because we did not use specific blocking agents for inward rectifiers.  However, we expect that there would only be sustained outward IR currents at voltages between EK and ~-60 mV: the foot of IKir’s I-V relation according to published data from mouse utricular hair cells – e.g., Holt and Eatock 1995, Rusch and Eatock 1996, Rusch et al. 1998, Horwitz et al., 2011, etc.  Thus, any such current would be unlikely to contaminate the residual outward rectifiers in Kv1.8-null animals, which activate positive to ~-60 mV. 

      (I-HCN is also not a problem, because it could only be outward positive to its reversal potential at ~-40 mV, which is significantly positive to its voltage activation range.)

    1. Author response:

      The following is the authors’ response to the original reviews.

      We edited the manuscript for clarity, added information described in new figure panels (below) and corrected typos.

      In figure 1 we corrected a typo.

      In figure 2, panel 2H, and Figure S2E, we included a new statistical analysis (mixed effect linear regression) to compare mutational burden in controls and AD patients.

      In figure 3, and Figure S4B, we revised the western blots panels in Panel 3E,F, to improve presentation of controls and quantification.

      we corrected typos.

      In figure 5 we removed a panel (former 5D) which did not add useful information.

      In Figure S1A we included information about sex and age from the control and patients analyzed. In Figure S2B, we added an analysis of the mutational burden in controls, distinguishing controls with and without cancer.

      We modified Table S1 for completeness of information for all samples analyzed.

      Reviewer #1:

      Weaknesses: 

      Even though the study is overall very convincing, several points could help to connect the seen somatic variants in microglia more with a potential role in disease progression. The connection of P-SNVs in the genes chosen from neurological disorders was not further highlighted by the authors. 

      All P-SNVs are reported in Table S3.

      We observed only two P-SNVs within genes associated to neurological disorders (brain panel in Table S2). - SQSTM1 (p.P392L) was identified in blood but not in brain from the patient AD48A.

      - OPTN was identified (p.Q467P) in PU.1 from control 25.   

      To highlight this point, we modified the first paragraph of the discussion as follow:

      “We report here that microglia from a cohort of 45 AD patients with intermediate-onset sporadic AD (mean age 65 y.o) is enriched for clones carrying pathogenic/oncogenic variants in genes associated with clonal proliferative disorders (Supplementary Table 2) in comparison to 44 controls. Of note we did not observe microglia P-SNVs within genes reported to be associated with neurological disorders (Supplementary Table 2) in patients, and one such variant was identified in a control (Supplementary Table 3) “.

      The authors show in snRNA-seq data that a disease-associated microglia state seems to be enriched in patients with somatic variants in the CBL ring domain, however, this analysis could be deepened. For example, how this knowledge may translate to patient benefits when the relevant cell populations appear concentrated in a single patient sample (Figure 5; AD52) is unclear; increasing the analyzed patient pool for Figure 5 and showcasing the presence of this microglia state of interest in a few more patients with driving mutations for CBL or other MAPK pathway associated mutations would lend their hypotheses further credibility. 

      We acknowledge this limitation, but we respectfully submit that the analysis was performed in 2 patients. AD 53 also show a MAPK-associated inflammatory signature in the microglia clusters associated with mutations.

      We performed the analysis on all FACS-purified PU.1+ nuclei samples that passed QC for single nuclei RNAseq. It should be noted that this analysis is extremely difficult with current technologies because microglia nuclei need to be fixed for PU.1 staining and FACS purification and the clones are small (~1% of microglia).

      A potential connection between P-SNVs in microglia and disease pathology and symptoms was not further explored by the authors. 

      At the population level, Braak/CERAD scores, the presence of Lewy bodies, amyloid angiopathy, tauopathy, or alpha synucleinopathy were not different between AD patients with or without pathogenic microglial clones (Figure S3 and Table S1). Of note, we studied here a homogenous population of AD patients.

      At the tissue level, the roles of mutant microglia in plaques for example is being investigated, but we do not have results to present at this time.

      A recent preprint (Huang et al., 2024) connected the occurrence of somatic variants in genes associated with clonal hematopoiesis in microglia in a large cohort of AD patients, this study is not further discussed or compared to the data in this manuscript. 

      This pre-print supports the high frequency of detection of oncogenic variants associated with clonal proliferative disorders, they hypothesize that the mutations may be associated with microglia, but they only check a few mutations in purified microglia. Most of the study is performed in whole brain tissue. It does not really bring new information as compared to other study we cite in the introduction (and to our manuscript).

      Reviewer #2 (Recommendations For The Authors): 

      Suggestions for improved or additional experiments, data, or analyses: 

      The authors can demonstrate that identified pathological SNVs from their AD cohort also lead to the activation of human microglia-like cells in vitro, but do not provide any data from histological examination of the patient cohort (e.g. accumulation at the plaque site, microglia distribution, and cell number). The study could be further supported by providing a histological examination of patients with and without P-SNVs to identify if microglia response to pathology, microglia accumulation, or phagocytic capacity are altered in these patients. 

      We performed IBA1 staining in brain samples from control and from AD patients, with or without microglial clones and microglia density was not different between patient with and without mutations. In addition, histological reports from the brain bank (Braak/CERAD scores, Lewis bodies, amyloid angiopathy, tauopathy, or alpha synucleinopathy did not suggest differences between patient with and without mutations (Figure S3). These results are preliminary and further investigations are ongoing.

      It would have been interesting to see if for example, transgenic AD mice with an introduced somatic mutation in microglia show an altered disease progression with alterations in amyloid pathology or cognition. 

      We agree with the reviewer. We performed an in vivo study with mice expressing a  5xFAD transgene, an inducible microglia Cx3cr1CreERt2 BrafLSL-V600E transgene, or both, and performed survival, behavioral (Y-Maze and Novel Object Recognition), and histological analyses for β-Amyloid, p-Tau and Iba1 staining.

      Microgliosis was increased in the group with the 2 transgenes, however the phenotype associated with the expression of a BrafV600E allele in microglia (Mass et al Nature 2017) was strongly dominant over the phenotype of 5xFAD mice, which did not allow us to conclude on survival and behavioral analyses.

      Other studies with different transgenes are in progress but we have no results yet to include in this revised manuscript.

      To connect the somatic mutations in microglia better to a potential contribution in neurodegeneration or neurotoxicity, the authors could provide further details on how to demonstrate if human microglia-like cells respond differentially to amyloid or induce neurotoxicity in a co-culture or slice culture model. 

      These studies are undertaken in the laboratory, but unfortunately, we have no results as yet to include in this revised manuscript.

      The number of samples analyzed for hippocampi, especially in the age-matched controls might be underpowered. 

      Unfortunately, despite our best efforts, we were not able to analyze more hippocampus from control individuals. To control for bias in sampling as well as to other potential bias in our analysis, we investigated the statistical analysis of the cohorts for inclusion of age as a criterion (age matched controls), inclusion of a random effect structure, and possible confounding factor such as sex, brain bank site, and samples’ anatomical location (see revised Methods and revised Fig. 2C, F, and H, and S2B).

      We first tested whether the inclusion of age is appropriate in a fixed-effects linear regression using a generalized linear model (GLM) with gaussian distribution. Compared to the baseline model, the model with age had significantly low AIC (from -66.6 to -71.9, P = 0.0067 by chi-square test). Therefore, the inclusion of age as a fixed effect is appropriate. We next tested multiple structures of mixed-effects linear modeling. We used donors as random effects, while utilizing age, disease status (neurotypical control vs. AD), or both as fixed effects. Fitting was performed using the lme function implemented in the nlme package with the maximum likelihood (ML) method. The incorporation of age and disease status significantly improved overall model fitting. Both age and AD are associated with a significant increase in SNV burden in this model (P<1x10^-4 and P=1x10^-4, respectively, by likelihood ratio test). The model's total explanatory power is substantial (conditional R^2=0.48). We also asked if the addition of potential confounding factors to the model is justified. Three factors were tested via the two above-mentioned methods: sex, brain bank site, and the anatomical location of the samples. In all cases, the AIC increased, and the P values by likelihood ratio tests were higher than 0.99. Therefore, from a statistical standpoint, the inclusion of these potential confounding factors does not seem to improve overall model fitting.

      Minor corrections to the text and figures: 

      The authors made a great effort to analyze various samples from one individual donor. One can get a bit confused by the sentence that "an average of 2.5 brains samples were analyzed for each donor". Maybe the authors could highlight more in the first paragraph of the results section and in Figure 1A, that there are multiple samples ("technical replicates") from one individual patient across different brain regions used. 

      We removed the ‘2.5’ sentence and rewrote the paragraph for clarity. Samples information’s are now displayed in Table S1.

      In the method section is a part included "Expression of target genes in microglia", it was very hard to allocate where these data from public data sets were actually used and for which analysis. Maybe the authors could clarify this again. 

      AU response: we apologize and corrected the paragraph in the methods (page 6) as follow: “ Expression of target genes in microglia. To evaluate the expression levels of the genes identified in this study as target of somatic variants, we consulted a publicly available database (https://www.proteinatlas.org/), and also plotted their expression as determined by RNAseq in 2 studies (Galatro et al. GSE99074 33, and Gosselin et al. 34) (Table S3 and Figure S2). For data from Galatro et al. (GSE99074) 33, normalized gene expression data and associated clinical information of isolated human microglia (N = 39) and whole brain (N = 16) from healthy controls were downloaded from GEO. For data from Gosselin et al. 34, raw gene expression ­data and associated clinical information of isolated microglia (N = 3) and whole brain (N = 1) from healthy controls were extracted from the original dataset. Raw counts were normalized using the DESeq2 package in R 35.”

      Table S3 is very informative, but also very complex. The reader could maybe benefit a lot from this table if it can be structured a bit easier especially when it comes to identifying P-SNVs and in which tissue sample they were found and if this was the same patient. The sorting function on top of the columns helps, but the color coding is a bit unclear. 

      Despite our best efforts we agree that the table, which contain all sequencing data for all samples, is complex. The color coding (red) only highlights the presence of pathogenic mutation.

      Reviewer #3 (Recommendations For The Authors): 

      This is a well-done study of an important problem. I present the following minor critiques: 

      At the bottom of Page 4 and into the top of Page 5, the authors state that 66 of the 826 variants identified in their panel sequencing experiment were found in multiple donors. Then the authors proceed to analyze the remaining 760 variants. It seems that the authors concluded that these multi-donor mosaics were artifacts, which is why they were excluded from further analysis. I think this is a reasonable assumption, but it should be stated explicitly so it is clear to the reader. Complicating this assumption, however, the authors later state that one of their CBL variants was found in two donors, and it is treated as a true mosaic. The authors should make it clear whether recurrent variants were filtered out of any given analysis. It remains possible that all recurrent variants are true mosaics that occurred in multiple donors. The authors should do a bit more to characterize these recurrent variants. Are they observed in the human population using a database like gnomAD, which, together with their recurrence, would strongly suggest they are germline variants? Are they in MAPK genes, or otherwise relevant to the study?

      We apologize for the confusion. Our original intent for the ddPCR validation of variants (Figure 1E) was to count only 1 ‘unique’ variant for variants found for example in 1 brain sample and in the blood from the same patient, or in 2 brain regions from one patient, in order to avoid the criticism of overinflating our validation rate. This was notably the case for TET2 and DNMT3 variants. For example, validation of a TET2 variant found in 2 different brain areas and blood of the same donor is counted as 1 and not 3. We did not eliminate these variants from the analysis as they passed the criteria for somatic variants as presented in Methods.

      In contrast, when a specific variant was found and validated in two different donors, we counted it as 2.

      The characterization of variants included multiple parameters and databases, including for example AF and gnomAD, as indicated in Methods and reported in Table S3.

      All ddPCR results can be found at the end of Table S3.

      Figure 2B labels age-matched controls as "C", but Figure 2C labels age-matched controls as AM-C. Labels should be consistent throughout the manuscript. 

      We corrected this in the revised version.

      It is not clear if the "p:0.02" label in Figure 2F is referring to AM-C Cx vs. AD-Cx or AM-C vs. AD. Please clarify. 

      We apologize for the confusion, and we corrected the legend. The calculated p value is for the comparison between Cortex from Controls (age-matched) and the Cortex from AD.

      On Page 7, the authors state, "The allelic frequencies at which MAPK activating variants are detected in brain samples from AD patients range from ~1-6% of microglia (Fig. 3G), which correspond to clones representing 2 to 12% of mutant microglia in these samples, assuming heterozygosity." I understand what the authors mean here but I think it's a bit confusingly stated. I suggest something like "The allelic frequencies at which MAPK activating variants are detected in brain samples from AD patients range from ~1-6% in microglia (Figure 3G), which correspond to mutant clones representing 2 to 12% of all microglia in these samples, assuming heterozygosity." 

      We thank the reviewer for this suggestion and re-wrote that sentence.

      Is there any evidence that the transcriptional regulators mutated in AD microglia (MED12, SETD2, MLL3, DNMT3A, ASXL1, etc.) are involved in regulating MAPK genes? This would tie these mutations into the broader conclusions of the paper. 

      This is a very interesting question, and indeed published studies indicate that some of the transcriptional /epigenetic regulators regulate expression of MAPK genes. However, in the absence of experimental evidence in microglia and patients, the argument may be too speculative to be included.

      Do the authors have any thoughts as to whether germline variants in CBL are linked to AD? If not, why do they think germline mutations in CBL are not relevant to AD? 

      This is also a very interesting question. As indicated in our manuscript, germline mutations in CBL (and other member of the classical MAPK genes, see Figure 3C) cause early onset (pediatric) and severe developmental diseases known as RASopathies, characterized by multiple developmental defects, and associated with frequent neurological and cognitive deficits.

      It is possible that some other (and more frequent?) germline variants may be associated with a late-onset brain restricted phenotype, but we did not find germline pSNV in our patients. GWAS studies may be more appropriate to test this hypothesis.

      Do any donors show multiple variants? I don't think this is addressed in the text. 

      We do find donors with multiple variants (see Figure 3D and Figure S3), however at this stage, we did not perform single nuclei genotyping to investigate whether they are part of the same clone.

      Figure S3 appears to be upside down. 

      This was corrected

      Figure 5C should have some kind of label telling the reader what gene set is being depicted. 

      We added this information above the panel (it was in the corresponding legend).

      At the top of Page 12, Lewy bodies are written as Lewis bodies. 

      This was corrected

      Many control donors died of cancer (Table S1). Is there any information on which, if any, chemotherapeutics or radiation these patients received? Might this impact the somatic mutation burden? The authors should compare controls with and without cancer or with and without cancer treatments to rule this out. 

      As suggested by the reviewer, we analyzed the mutational load of age-matched controls with and without cancer (revised Figure S2B). As expected, we saw an increase in the mutational load in controls with cancer, particularly in their blood. This information was added in the result section.

      This is most likely associated with the treatments received as well as possible cancer clones.

      The formatting for Table S3 is odd. Multiple different fonts are used (this is also seen in Table S5). Column Q has no column ID. The word "panel" is spelled "pannel." The word "expressed" is spelled "expressd" in one of the worksheet labels. Columns BG-BN in the ALL-SNV worksheet are blank but seemingly part of the table. 

      We fixed this error in Table S3.

    2. eLife assessment

      This fundamental study enhances our understanding of how somatic variants in microglia might influence the onset and progression of neurodegenerative diseases such as Alzheimer's. The evidence supporting the conclusions is compelling, with the authors employing a multi-faceted approach to identify an enrichment of potentially pathogenic somatic mutations in Alzheimer's disease microglia. This research will be of significant interest to those investigating somatic mutations, Alzheimer's disease, microglial biology and cell signalling pathways.

    3. Reviewer #1 (Public review):

      In the revised manuscript Vicario et al. provide new insights on a potential contribution of somatic mutations within the microglia population of the CNS that accelerates microglia activation and disease-associated gene signatures in Alzheimer's disease. Here they especially identified an "enrichment" of pathological SNVs in microglia, but not the peripheral blood, that are associated with clonal proliferative disorders and neurological diseases in a subset of patients with AD. They identified P-SNVs in microglia of AD patients located within the ring domain of CBL, a negative regulator of MAPK signaling. They further provide mechanistic insights how these variants result in MAPK over-activation and subsequently in a pro-inflammatory phenotype in human microglia-like cells in vitro.

      Overall, this study provides novel evidence from an AD patient cohort pointing to a potential contribution of microglia-specific somatic mutations to disease onset and/or progression in at least a subset of patients with Alzheimer's disease.

      The work within this study is highly relevant and will open new study lines to explore somatic mutations within the microglia compartment and neurodegenerative diseases.

      Strengths:

      As outlined above, the study identified P-SNVs in microglia of AD patients associated with clonal proliferative disorders, but also give an in depth analysis in re-occurring P-SNVs located within the ring domain of CBL, a negative regulator of MAPK signaling. They further provide mechanistic insights how these variants result in MAPK over-activation and subsequently in a pro-inflammatory phenotype in HEK cells, BV2 cells, MAC cells and human microglia-like cells in vitro. The over-activation of the cells in vitro is convincing.

      Great care was taken to identify the limitations of the possible conclusions and to make careful conclusions. For example, they highlight that the pathway proposed to be affected may be an explanation for a subset of AD patients, and emphasize that it is yet unclear whether this accumulation of pathological SNVs is a cause or consequence of disease progression

      The study supports an enrichment of P-SNVs in several genes associated clonal proliferative disorders in microglia and nicely separates this from SNVs associated with clonal hematopoiesis in the peripheral blood found in AD patients and controls.

      The authors further acknowledged that several age matched control patients were diagnosed with cancer or tumor-associated diseases and carefully dissected the occurring SNVs in these patients are not associated with the P-SNVs identified in the microglial compartment of the AD cohort.

      Weaknesses:

      The revised study is overall convincing and has improved in the revised version, but some points especially regarding the clear connection of the seen somatic variants in microglia with a potential role in disease progression remain unanswered.

      A potential connection between P-SNVs in microglia and disease pathology and symptoms was not further explored by the authors but might be in future work.

      Taken this into account, maybe the title is a bit overstated and could be tuned down.

    4. Reviewer #2 (Public review):

      Summary:

      In this study, Vicaro et al. aimed to quantify and characterize mosaic mutations in human sporadic Alzheimer's disease (AD) brain samples. They focused on three broad classes of brain cells, neurons that express the marker NeuN, microglia that express the marker PU.1, and double-negative cells that presumably comprise all other brain cell types, including astrocytes, oligodendrocytes, oligodendrocyte progenitor cells, and endothelial cells. The authors find an enrichment of potentially pathogenic somatic mutations in AD microglia compared to controls, with MAPK pathway genes being particularly enriched for somatic mutations in those cells. The authors report a striking enrichment for mutations in the gene CBL and use in vitro functional assays to show that these mutations indeed induce MAPK pathway activation.

      The current state of the AD and somatic mutation fields puts this work into context. First, AD is a devastating disease whose prevalence is only increasing as the population of the U.S. is aging, necessitating the investigation of novel features of AD to identify new therapeutic opportunities. Second, microglia have recently come into focus as important players in AD pathogenesis. Many AD risk genes are selectively expressed in microglia, and microglia from AD brain samples show a distinct transcriptional profile indicating an inflammatory phenotype. The authors' previous work shows that a genetic mouse model of mosaic BRAF activation in macrophages (including microglia) displays a neurodegenerative phenotype similar to AD (Mass et al., 2017, doi:10.1038/nature23672). Third, new technological developments have allowed for identifying mosaic mutations present in only a small fraction of or even single cells. Together, these data form a rationale for studying mosaic mutations in microglia in AD. In light of the authors' findings regarding MAPK pathway gene somatic mutations, it is also important to note that MAPK has previously been implicated in AD neuroinflammation in the literature.

      Strengths:

      The study demonstrated several strengths. Firstly, the authors used two methods to identify mosaic mutations: 1) deep (~1,100x) DNA sequencing of a targeted panel of >700 genes they hypothesized might, if mutated somatically, play a role in AD, and 2) deep (400x) whole-exome sequencing (WES) to identify clonal mosaics outside of those genes. A second strength is the agreement between these experiments, where WES found many variants identified in the panel experiment, and both experiments revealed somatic mutations in MAPK pathway genes. Third, the authors demonstrated in several in vitro systems that many mutations they identified in MAPK genes activate MAPK signaling. Finally, the authors showed that in some human brain samples, single-cell gene expression analysis revealed that cells bearing a mosaic MAPK pathway mutation displayed dysregulated inflammatory signaling and dysregulation in other pathways. This single-cell analysis was in agreement with their in vitro analyses.

      Weaknesses:

      The study also showed some weaknesses. The sample size (45 AD donors and 44 controls) is small, reflected in the relatively modest effect sizes and p-values observed. This weakness is partially ameliorated by the authors' extensive molecular and functional validation of mutation candidates. Secondly, as the authors point out, this study cannot conclude whether microglial mosaic mutations cause AD or are an effect of AD. Future studies may shed more light on this important question.

      Conclusions and Impact:

      Considering the study's aims, strengths, and weaknesses, I conclude that the authors achieved their goal of characterizing the role of mosaic mutations in human AD. Their data strongly suggest that mosaic MAPK mutations in microglia are associated with AD. The impacts of this study remain to be seen, but they could include attempts to target CBL or other mutated genes in the treatment of AD. This work also suggests a similar approach to identifying potentially causative somatic mutations in other neurodegenerative diseases.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive reviews.  Taken together, the comments and suggestions from reviewers made it clear that we needed to focus on improving the clarity of the methods and results.  We have revised the manuscript with that in mind.  In particular, we have restructured the results to make the logic of the manuscript clearer and we have added details to the methods section.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The work of Muller and colleagues concerns the question of where we place our feet when passing uneven terrain, in particular how we trade-off path length against the steepness of each single step. The authors find that paths are chosen that are consistently less steep and deviate from the straight line more than an average random path, suggesting that participants indeed trade-off steepness for path length. They show that this might be related to biomechanical properties, specifically the leg length of the walkers. In addition, they show using a neural network model that participants could choose the footholds based on their sensory (visual) information about depth. 

      Strengths: 

      The work is a natural continuation of some of the researchers' earlier work that related the immediately following steps to gaze [17]. Methodologically, the work is very impressive and presents a further step forward towards understanding real-world locomotion and its interaction with sampling visual information. While some of the results may seem somewhat trivial in hindsight (as always in this kind of study), I still think this is a very important approach to understanding locomotion in the wild better. 

      Weaknesses: 

      The manuscript as it stands has several issues with the reporting of the results and the statistics. In particular, it is hard to assess the inter-individual variability, as some of the data are aggregated across individuals, while in other cases only central tendencies (means or medians) are reported without providing measures of variability; this is critical, in particular as N=9 is a rather small sample size. It would also be helpful to see the actual data for some of the information merely described in the text (e.g., the dependence of \Delta H on path length). When reporting statistical analyses, test statistics and degrees of freedom should be given (or other variants that unambiguously describe the analysis).

      There is only one figure (Figure 6) that shows data pooled over subjects and this is simply to illustrate how the random paths were calculated. The actual paths generated used individual subject data. We don’t draw our conclusions from these histograms – they are instead used to generate bounds for the simulated paths.  We have made clear both in the text and in the figure legends when we have plotted an example subject. Other plots show the individual subject data. We have given the range of subject medians as well as the standard deviation for data illustrated in Figure (random vs chosen), we have also given the details of the statistical test comparing the flatness of the chosen paths versus the randomly generated paths.  We have added two supplemental figures to show individual walker data more directly: (Fig. 14) the per subject histograms of step parameters, (Fig. 18) the individual subject distributions for straight path slopes and tortuosity.

      The CNN analysis chosen to link the step data to visual sampling (gaze and depth features) should be motivated more clearly, and it should describe how training and test sets were generated and separated for this analysis.

      We have motivated the CNN analysis and moved it earlier in the manuscript to help clarify the logic the manuscript. Details of the training and test are now provided, and the data have been replotted. The values are a little different from the original plot after making a correction in the code, but the conclusions drawn from this analysis are unchanged. This analysis simply shows that there is information in the depth images from the subject’s perspective that a network can use to learn likely footholds. This motivates the subsequent analysis of path flatness.

      There are also some parts of figures, where it is unclear what is shown or where units are missing. The details are listed in the private review section, as I believe that all of these issues can be fixed in principle without additional experiments. 

      Several of the Figures have been replotted to fix these issues.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript examines how humans walk over uneven terrain using vision to decide where to step. There is a huge lack of evidence about this because the vast majority of locomotion studies have focused on steady, well-controlled conditions, and not on decisions made in the real world. The author team has already made great advances in this topic, but there has been no practical way to map 3D terrain features in naturalistic environments. They have now developed a way to integrate such measurements along with gaze and step tracking, which allows quantitative evaluation of the proposed trade-offs between stepping vertically onto vs. stepping around obstacles, along with how far people look to decide where to step. 

      Strengths: 

      (1) I am impressed by the overarching outlook of the researchers. They seek to understand human decision-making in real-world locomotion tasks, a topic of obvious relevance to the human condition but not often examined in research. The field has been biased toward well-controlled studies, which have scientific advantages but also serious limitations. A well-controlled study may eliminate human decisions and favor steady or periodic motions in laboratory conditions that facilitate reliable and repeatable data collection. The present study discards all of these usually-favorable factors for rather uncontrolled conditions, yet still finds a way to explore real-world behaviors in a quantitative manner. It is an ambitious and forward-thinking approach, used to tackle an ecologically relevant question. 

      (2) There are serious technical challenges to a study of this kind. It is true that there are existing solutions for motion tracking, eye tracking, and most recently, 3D terrain mapping. However most of the solutions do not have turn-key simplicity and require significant technical expertise. To integrate multiple such solutions together is even more challenging. The authors are to be commended on the technical integration here.

      (3) In the absence of prior studies on this issue, it was necessary to invent new analysis methods to go with the new experimental measures. This is non-trivial and places an added burden on the authors to communicate the new methods. It's harder to be at the forefront in the choice of topic, technical experimental techniques, and analysis methods all at once. 

      Weaknesses: 

      (1) I am predisposed to agree with all of the major conclusions, which seem reasonable and likely to be correct. Ignoring that bias, I was confused by much of the analysis. There is an argument that the chosen paths were not random, based on a comparison of probability distributions that I could not understand. There are plots described as "turn probability vs. X" where the axes are unlabeled and the data range above 1. I hope the authors can provide a clearer description to support the findings. This manuscript stands to be cited well as THE evidence for looking ahead to plan steps, but that is only meaningful if others can understand (and ultimately replicate) the evidence. 

      We have rewritten the manuscript with the goal of clarifying the analyses, and we have re-labelled the offending figure.

      (2) I wish a bit more and simpler data could be provided. It is great that step parameter distributions are shown, but I am left wondering how this compares to level walking.  The distributions also seem to use absolute values for slope and direction, for understandable reasons, but that also probably skews the actual distribution. Presumably, there should be (and is) a peak at zero slope and zero direction, but absolute values mean that non-zero steps may appear approximately doubled in frequency, compared to separate positive and negative. I would hope to see actual distributions, which moreover are likely not independent and probably have a covariance structure. The covariance might help with the argument that steps are not random, and might even be an easy way to suggest the trade-off between turning and stepping vertically. This is not to disregard the present use of absolute values but to suggest some basic summary of the data before taking that step. 

      We have replotted the step parameter distributions without absolute values. Unfortunately, the covariation of step parameters (step direction and step slope) is unlikely to help establish this tradeoff.  Note that the primary conclusion of the manuscript is that works make turns to keep step slope low (when possible). Thus, any correlation that might exist between goal direction and step slope would be difficult to interpret without a direct comparison to possible alternative paths (as we have done in this paper). As such we do not draw our conclusions from them.  We use them primarily to generate plausible random paths for comparison with the chosen paths.  We have added two supplementary figures including distributions (Fig 15) and covariation of all the step parameters discussed in the methods (Fig 16).

      (3) Along these same lines, the manuscript could do more to enable others to digest and go further with the approach, and to facilitate interpretability of results. I like the use of a neural network to demonstrate the predictiveness of stepping, but aside from above-chance probability, what else can inform us about what visual data drives that?

      The CNN analysis simply shows that the information is there in the image from the subject’s viewpoint and is used to motivate the subsequent analysis.  As noted above, we have generally tried to improve the clarity of the methods.

      Similarly, the step distributions and height-turn trade-off curves are somewhat opaque and do not make it easy to envision further efforts by others, for example, people who want to model locomotion. For that, clearer (and perhaps) simpler measures would be helpful. 

      We have clarified the description of these plots in the main text and in the methods.  We have also tried to clarify why we made the choices that we did in measuring the height-turn trade-off and why it is necessary in order to make a fair comparison.

      I am absolutely in support of this manuscript and expect it to have a high impact. I do feel that it could benefit from clarification of the analysis and how it supports the conclusions. 

      Reviewer #3 (Public Review): 

      Summary: 

      The systematic way in which path selection is parametrically investigated is the main contribution. 

      Strengths: 

      The authors have developed an impressive workflow to study gait and gaze in natural terrain. 

      Weaknesses: 

      (1) The training and validation data of the CNN are not explained fully making it unclear if the data tells us anything about the visual features used to guide steering. It is not clear how or on what data the network was trained (training vs. validation vs. un-peeked test data), and justification of the choices made. There is no discussion of possible overfitting. The network could be learning just e.g. specific rock arrangements. If the network is overfitting the "features" it uses could be very artefactual, pixel-level patterns and not the kinds of "features" the human reader immediately has in mind. 

      The CNN analysis has now been moved earlier in the manuscript to help clarify its significance and we have expanded the description of the methods. Briefly, it simply indicates that there is information in the depth structure of the terrain that can be learned by a network. This helps justify the subsequent analyses.  Importantly, the network training and testing sets were separated by terrain to ensure that the model was being tested on “unseen” terrain and avoid the model learning specific arrangements.  This is now clarified in the text.

      (2) The use of descriptive terminology should be made systematic. 

      Specifically, the following terms are used without giving a single, clear definition for them: path, step, step location, foot plant, foothold, future foothold, foot location, future foot location, foot position. I think some terms are being used interchangeably. I would really highly recommend a diagrammatic cartoon sketch, showing the definitions of all these terms in a single figure, and then sticking to them in the main text. 

      We have made the language more systematic and clarified the definition of each term (see Methods). Path refers to the sequence of 5 steps. Foothold is where the foot was placed in the environment. A step is the transition from one foothold to the next.

      (3) More coverage of different interpretations / less interpretation in the abstract/introduction would be prudent.  The authors discuss the path selection very much on the basis of energetic costs and gait stability. At least mention should be given to other plausible parameters the participants might be optimizing (or that indeed they may be just satisficing). That is, it is taken as "given" that energetic cost is the major driver of path selection in your task, and that the relevant perception relies on internal models. Neither of these is a priori obvious nor is it as far as I can tell shown by the data (optimizing other variables, satisficing behavior, or online "direct perception" cannot be ruled out). 

      The abstract has been substantially rewritten.  We have adjusted our language in the introduction/discussion to try to address this concern.

      Recommendations for the authors:

      Reviewing Editor comments 

      You will find a full summary of all 3 reviews below. In addition to these reviews, I'd like to highlight a few points from the discussion among reviewers. 

      All reviewers are in agreement that this study has the potential to be a fundamental study with far-reaching empirical and practical implications. The reviewers also appreciate the technical achievements of this study. 

      At the same time, all reviewers are concerned with the overall lack of clarity in how the results are presented. There are a considerable number of figures that need better labeling, text parts that require clearer definitions, and the description of data collection and analysis (esp. with regard to the CNN) requires more care. Please pay close attention to all comments related to this, as this was the main concern that all reviewers shared. 

      At a more specific level, the reviewers discussed the finding around leg length, and admittedly, found it hard to believe, in short: "extraordinary claims need strong evidence". It would be important to strengthen this analysis by considering possible confounds, and by including a discussion of the degree of conviction. 

      We have weakened the discussion of this finding and provided some an additional analyses in a supplemental figure (Figure 17) to help clarify the finding.

      Reviewer #1 (Recommendations For The Authors): 

      First, let me apologize for the long delay with this review. Despite my generally positive evaluation (see public review), I have some concerns about the way the data are presented and questions about methodological details. 

      (1) Representation of results: I find it hard to decipher how much variability arises within an individual and how much across individuals. For example, Figure 7b seems to aggregate across all individuals, while the analysis is (correctly) based on the subject medians.

      Figure 7b That figure was just one subject. This is now clarified.

      It would be good to see the distribution of all individuals (maybe use violin plots for each observer with the true data on one side and the baseline data on the other, or simple histograms for each). To get a feeling for inter-individual and intra-individual variability is crucial, as obviously (see the leg-length analysis) there are larger inter-individual differences and representations like these would be important to appreciate whether there is just a scaling of more or less the same effect or whether there are qualitative differences (especially in the light of N=9 being not a terribly huge sample size). 

      The medians for the individual subjects are now provided with the standard deviations between subjects to indicate the extent of individual differences. Note that the random paths were chosen from the distribution of actual step slopes for that subject as one of the constraints. This makes the random paths statistically similar to the chosen paths with the differences only being generated by the particular visual context. Thus the test for a difference between chosen and random is quite conservative

      Similarly, seeing \DeltaH plotted as a function of steps in the path as a figure rather than just having the verbal description would also help. 

      To simplify the discussion of our methods/results we have removed the analyses that examine mean slope as a function of steps.  Because of the central limit theorem the slopes of the chosen paths remain largely unchanged regardless of the choice path length.  The slopes of the simulated paths are always larger irrespective of the choice of path length.

      (2) Reporting the statistical analyses: This is related to my previous issue: I would appreciate it if the test statistics and degrees-of-freedom of the statistical tests were given along with the p-values, instead of only the p-values. This at some points would also clarify how the statistics were computed exactly (e.g., "All subjects showed comparable difference and the difference in medians evaluated across subjects was highly significant (p<<0.0001).", p.10, is ambiguous to me). 

      Details have been added as requested.

      (3) Why is the lower half ("tortuosity less than the median tortuosity") of paths used as "straight" rather than simply the minimum of all viable paths)?

      The benchmark for a straight path is somewhat arbitrary. Using the lower half rather than the minimum length path is more conservative.

      (4) For the CNN analysis, I failed to understand what was training and what was test set. I understand that the goal is to predict for all pixels whether they are a potential foothold or not, and the AUC is a measure of how well they can be discriminated based on depth information and then this is done for each image and the median over all images taken. But on which data is the CNN trained, and on which is it tested? Is this leave-n-out within the same participant? If so, how do you deal with dependencies between subsequent images? Or is it leave-1-out across participants? If so, this would be more convincing, but again, the same image might appear in training and test. If the authors just want to ask how well depth features can discriminate footholds from non-footholds, I do not see the benefit of a supervised method, which leaves the details of the feature combinations inside a black box. Rather than defining the "negative set" (i.e., the non-foothold pixels) randomly, the simulated paths could also be used, instead. If performance (AUC) gets lower than for random pixels, this would confirm that the choice of parameters to define a "viable path" is well-chosen. 

      This has been clarified as described above.

      Minor issues: 

      (5) A higher tortuosity would also lead a participant to require more steps in total than a lower tortuosity. Could this partly explain the correlation between the leg length and the slope/tortuosity correlation? (Longer legs need fewer steps in total, thus there might be less tradeoff between \Delta H and keeping the path straight (i.e., saving steps)). To assess this, you could give the total number of steps per (straight) distance covered for leg length and compare this to a flat surface.

      The calculations are done on an individual subject basis and the first and last step locations are chosen from the actual foot placements, then the random paths are generated between those endpoints. The consequence of this is that the number of steps is held constant for the analysis.  We have clarified the methods for this analysis to try to make this more clear.

      (6) As far as I understand, steps happen alternatingly with the two feet. That is, even on a flat surface, one would not reach 0 tortuosity. In other words, does the lateral displacement of the feet play a role (in particular, if paths with even and paths with odd number of steps were to be compared), and if so, is it negligible for the leg-length correlation? 

      All the comparisons here are done for 5 step sequences so this potential issue should not affect the slope of the regression lines or the leg length correlation.

      (7) Is there any way to quantify the quality of the depth estimates? Maybe by taking an actual depth image (e.g., by LIDAR or similar) for a small portion of the terrain and comparing the results to the estimate? If this has been done for similar terrain, can a quantification be given? If errors would be similar to human errors, this would also be interesting for the interpretation of the visual sampling data.

      Unfortunately, we do not have the ground truth depth image from LIDAR.  When these data were originally collected, we had not imagined being able to reconstruct the terrain.  However, we agree with the reviewers that this would be a good analysis to do. We plan to collect LIDAR in future experiments. 

      To provide an assessment of quality for these data in the absence of a ground truth depth image, we have performed an evaluation of the reliability of the terrain reconstruction across repeats of the same terrain both between and within participants.  We have expanded the discussion of these reliability analyses in the results section entitled “Evaluating Terrain Reconstruction”, as well as in the corresponding methods section (see Figure 10).

      (8) The figures are sometimes confusing and a bit sloppy. For example, in Figure 7a, the red, cyan, and green paths are not mentioned in the caption, in Figure 8 units on the axes would be helpful, in Figure 9 it should probably be "tortuosity" where it now states "curviness". 

      These details have been fixed.

      (9) I think the statement "The maximum median AUC of 0.79 indicates that the 0.79 is the median proportion of pixels in the circular..." is not an appropriate characterization of the AUC, as the number of correctly classified pixels will not only depend on the ROC (and thus the AUC), but also on the operating point chosen on the ROC (which is not specified by the AUC alone). I would avoid any complications at this point and just characterize the AUC as a measure of discriminability between footholds and non-footholds based on depth features. 

      This has been fixed.

      (10) Ref. [16]is probably the wrong Hart paper (I assume their 2012 Exp. Brain Res. [https://doi.org/10.1007/s00221-012-3254-x] paper is meant at this point) 

      Fixed

      Typos (not checked systematically, just incidental discoveries): 

      (11) "While there substantial overlap" (p.10) 

      (12) "field.." (p.25) 

      (13) "Introduction", "General Discussion" and "Methods" as well as some subheadings are numbered, while the other headings (e.g., Results) are not. 

      Fixed

      Reviewer #2 (Recommendations For The Authors): 

      The major suggestions have been made in the Public Review. The following are either minor comments or go into more detail about the major suggestions. All of these comments are meant to be constructive, not obstructive. 

      Abstract. This is well written, but the main conclusions "Walkers avoid...This trade off is related...5 steps ahead" sound quite qualitative. They could be strengthened by more specificity (NOT p-values), e.g. "positive correlation between the unevenness of the path straight ahead and the probability that people turned off that path." 

      The abstract has been substantially rewritten.

      P. 5 "pinning the head position estimated from the IMU to the Meshroom estimates" sounds like there are two estimates. But it does not sound like both were used. Clarify, e.g. the Meshroom estimate of head position was used in place of IMU? 

      Yes that’s correct.  We have clarified this in the text.

      Figure 5. I was confused by this. First, is a person walking left to right? When the gaze position is shown, where was the eye at the time of that gaze? There are straight lines attached to the blue dots, what do they represent? The caption says gaze is directed further along the path, which made me guess the person is walking right to left, and the line originates at the eye. Except the origins do not lie on or close to the head locations. There's also no scale shown, so maybe I am completely misinterpreting. If the eye locations were connected to gaze locations, it would help to support the finding that people look five steps ahead of where they step. 

      We have updated the figure and clarified the caption to remove these confusions.  There was a mistake in the original figure (where the yellow indicated head locations, we had plotted the center of mass and the choice of projection gave the incorrect impression that the fixations off the path, in blue, were separated from the head).

      The view of the data is now presented so the person is walking left to right and with a projection of the head location (orange), gaze locations (blue or green) and feet (pink).

      Figure 6. As stated in the major comments, the step distributions would be expected to have a covariance structure (in terms of raw data before taking absolute values). It would be helpful to report the covariances (6 numbers). As an example of a simple statistical analysis, a PCA (also based on a data covariance) would show how certain combinations of slope/distance/direction are favored over others. Such information would be a simple way to argue that the data are not completely random, and may even show a height-turn trade-off immediately. (By the way, I am assuming absolute values are used because the slopes and directions are only positive, but it wasn't clear if this was the definition.) A reason why covariances and PCA are helpful is that such data would be helpful to compute a better random walk, generated from dynamics. I believe the argument that steps are not random is not served by showing the different histograms in Figure 7, because I feel the random paths are not fairly produced. A better argument might draw randomly from the same distribution as the data (or drive a dynamical random walk), and compare with actual data. There may be correlations present in the actual data that differ from random. I could be mistaken, because it is difficult or impossible to draw conclusions from distributions of absolute values, or maybe I am only confused. In any case, I suspect other readers will also have difficulty with this section. 

      This has been addressed above in the major comments.

      p. 9, "average step slope" I think I understand the definition, but I suggest a diagram might be helpful to illustrate this.

      There is a diagram of a single step slope in Figure 6 and a diagram of the average step slope for a path segment in Figure 12.

      Incidentally, the "straight path slope" is not clearly defined. I suspect "straight" is the view from above, i.e. ignoring height changes. 

      Clarified

      p. 11 The tortuosity metric could use a clearer definition. Should I interpret "length of the chosen path relative to a straight path" as the numerator and denominator? Here does "length" also refer to the view from above? Why is tortuosity defined differently from step slope? Couldn't there be an analogue to step slope, except summing absolute values of direction changes? Or an analogue to tortuosity, meaning the length as viewed from the side, divided by the length of the straight path? 

      We followed the literature in the definition of tortuosity.  We have clarified the definition of tortuosity in the methods, but yes, you can interpret the length of the chosen path relative to a straight path, as the numerator and denominator, and length refers to 3D length.  We agree that there are many interesting ways to look at the data but for clarity we have limited the discussion to a single definition of tortuosity in this paper.

      Figure 8 could use better labeling. On the left, there is a straight path and a more tortuous path, why not report the metrics for these? On the right, there are nine unlabeled plots. The caption says "turn probability vs. straight path slope" but the vertical axis is clearly not a probability. Perhaps the axis is tortuosity? I presume the horizontal axis is a straight path slope in degrees, but this is not explained. Why are there nine plots, is each one a subject? I would prefer to be informed directly instead of guessing. (As a side note, I like the correlations as a function of leg length, it is interesting, even if slightly unbelievable. I go hiking with people quite a bit shorter and quite a lot taller than me, and anecdotally I don't think they differ so much from each other.) 

      We have fixed Figure 8 which shows the average “mean slope” as a function of tortuosity.  We have added a supplemental figure which shows a scatter plot of the raw data (mean slope vs. tortuosity for each path segment).  

      Note that when walking with friends other factors (e.g. social) will contribute to the cost function. As a very short person my experience is that it is a problem. In any case, the data are the data, whatever the underlying reasons. It does not seem so surprising that people of different heights make different tradeoffs. We know that the preferred gait depends on individual’s passive dynamics as described in the paper, and the terrain will change what is energetically optimal as described in the Darici and Kuo paper.

      Figure 9 presumably shows one data point per subject, but this isn't clear. 

      The correlations are reported per subject, and this has been clarified. 

      p. 13 CNN. I like this analysis, but only sort of. It is convincing that there is SOME sort of systematic decision-making about footholds, better than chance. What it lacks is insight. I wonder what drives peoples' decisions. As an idle suggestion, the AlexNet (arXiv: Krizhevsky et al.; see also A. Karpathy's ConvNETJS demo with CIFAR-10) showed some convolutional kernels to give an idea of what the layers learned. 

      Further exploration of CNN’s would definitely be interesting, but it is outside the scope of the paper. We use it simply to make a modest point, as described above.

      p. 15 What is the definition of stability cost? I understand energy cost, but it is unclear how circuitous paths have a higher stability cost. One possible definition is an energetic cost having to do with going around and turning. But if not an energy cost, what is it? 

      We meant to say that the longer and flatter paths are presumably more stable because of the smaller height changes. You are correct that we can’t say what the stability cost is and we have clarified this in the discussion.

      p. 16 "in other data" is not explained or referenced.

      Deleted 

      p. 10 5 step paths and p. 17 "over the next 5 steps". I feel there is very little information to really support the 5 steps. A p-value only states the significance, not the amount of difference. This could be strengthened by plotting some measures vs. the number of steps ahead. For example, does a CNN looking 1-5 steps ahead predict better than one looking N<5 steps ahead? I am of course inclined to believe the 5 steps, but I do not see/understand strong quantitative evidence here. 

      We have weakened the statements about evidence for planning 5 steps ahead.

      p. 25 CNN. I did not understand the CNN. The list of layers seems incomplete, it only shows four layers. The convolutional-deconvolutional architecture is mentioned as if that is a common term, which I am unfamiliar with but choose to interpret as akin to encoder-decoder. However, the architecture does not seem to have much of a bottleneck (25x25x8 is not greatly smaller than 100x100x4), so what is the driving principle? It's also unclear how the decoder culminates, does it produce some m x m array of probabilities of stepping, where m is some lower dimension than the images? It might be helpful also to illustrate the predictions, for example, show a photo of the terrain view, along with a probability map for that view. I would expect that the reader can immediately say yes, I would likely step THERE but not there. 

      We have clarified the description of the CNN. An illustration is shown in Figure 11.

      Reviewer #3 (Recommendations For The Authors): 

      (This section expands on the points already contained in the Public Review). 

      Major issues 

      (1) The training and validation data of the CNN are not explained fully making it unclear if the data tells us anything about the visual features used to guide steering. A CNN was used on the depth scenes to identify foothold locations in the images. This is the bit of the methods and the results that remains ambiguous, and the authors may need to revisit the methods/results. It is not clear how or on what data the network was trained (training vs. validation vs. un-peeked test data), and justification of the choices made. There is no discussion of possible overfitting. The network could be learning just for example specific rock arrangements in the particular place you experimented. Training the network on data from one location and then making it generalize to another location would of course be ideal. Your network probably cannot do this (as far as I can tell this was not tried), and so the meaning of the CNN results cannot really be interpreted. 

      I really like the idea, of getting actual retinotopic depth field approximations. But then the question would be: what features in this information are relevant and useful for visual guidance (of foot placement)? But this question is not answered by your method. 

      "If a CNN can predict these locations above chance using depth information, this would indicate that depth features can be used to explain some variation in foothold selection." But there is no analysis of what features they are. If the network is overfitting they could be very artefactual, pixel-level patterns and not the kinds of "features" the human reader immediately has in mind. As you say "CNN analysis shows that subject perspective depth features are predictive of foothold locations", well, yes, with 50,000 odd parameters the foothold coordinates can be associated with the 3D pixel maps, but what does this tell us? 

      See previous discussion of these issues.

      It is true that we do not know the precise depth features used. We established that information about height changes was being used, but further work is needed to specify how the visual system does this. This is mentioned in the Discussion.

      You open the introduction with a motivation to understand the visual features guiding path selection, but what features the CNN finds/uses or indeed what features are there is not much discussed. You would need to bolster this, or down-emphasize this aspect in the Introduction if you cannot address it. 

      "These depth image features may or may not overlap with the step slope features shown to be predictive in the previous analysis, although this analysis better approximates how subjects might use such information." I do not think you can say this. It may be better to approximate the kind of (egocentric) environment the subjects have available, but as it is I do not see how you can say anything about how the subject uses it. (The results on the path selection with respect to the terrain features, viewpoint viewpoint-independent allocentric properties of the previous analyses, are enough in themselves!) 

      We have rewritten the section on the CNN to make clearer what it can and cannot do and its role in the manuscript. See previous discussion.

      (2) The use of descriptive terminology should be made systematic. Overall the rest of the methodology is well explained, and the workflow is impressive. However, to interpret the results the introduction and discussion seem to use terminology somewhat inconsistently. You need to dig into the methods to figure out the exact operationalizations, and even then you cannot be quite sure what a particular term refers to. Specifically, you use the following terms without giving a single, clear definition for them (my interpretation in parentheses): 

      foothold (a possible foot plant location where there is an "affordance"? or a foot plant location you actually observe for this individual? or in the sample?) 

      step (foot trajectory between successive step locations) 

      step location (the location where the feet are placed) 

      path (are they lines projected on the ground, or are they sequences of foot plants? The figure suggests lines but you define a path in terms of five steps. 

      foot plant (occurs when the foot comes in contact with step location?) 

      future foothold (?) 

      foot location (?) 

      future foot location (?) 

      foot position (?) 

      I think some terms are being used interchangeably here? I would really highly recommend a diagrammatic cartoon sketch, showing the definitions of all these terms in a single figure, and then sticking to them in the main text. Also, are "gaze location" and "fixation" the same? I.e. is every gaze-ground intersection a "gaze location" (I take it it is not a "fixation", which you define by event identification by speed and acceleration thresholds in the methods)? 

      We have cleaned up the language. A foothold is the location in the terrain representation (mesh) where the foot was placed. A step is the transition from one foothold to the next. A path is the sequences of 5 steps. The lines simply illustrate the path in the Figures. A gaze location is the location in the terrain representation where the walker is holding gaze still (the act of fixating). See Muller et al (2023) for further explanation.

      (3) More coverage of different interpretations / less interpretation in the abstract/introduction would be prudent. You discuss the path selection very much on the basis of energetic costs and gait stability. At least mention should be given to other plausible parameters the participants might be optimizing (or that indeed they may be just satisficing). Temporal cost (more circuitous route takes longer) and uncertainty (the more step locations you sample the more chance that some of them will not be stable) seem equally reasonable, given the task ecology / the type of environment you are considering. I do not know if there is literature on these in the gait-scene, but even if not then saying you are focusing on just one explanation because that's where there is literature to fall back on would be the thing to do. 

      Also in the abstract and introduction you seem to take some of this "for granted". E.g. you end the abstract saying "are planning routes as well as particular footplants. Such planning ahead allows the minimization of energetic costs. Thus locomotor behavior in natural environments is controlled by decision mechanisms that optimize for multiple factors in the context of well-calibrated sensory and motor internal models". This is too speculative to be in the abstract, in my opinion. That is, you take as "given" that energetic cost is the major driver of path selection in your task, and that the relevant perception relies on internal models. Neither of these is a priori obvious nor is it as far as I can tell shown by your data (optimizing other variables, satisficing behavior, or online "direct perception" cannot be ruled out). 

      We have rewritten the abstract and Discussion with these concerns in mind.

      You should probably also reference: 

      Warren, W. H. (1984). Perceiving affordances: Visual guidance of stair climbing. Journal of Experimental Psychology: Human Perception and Performance, 10(5), 683-703. https://doi.org/10.1037/0096-1523.10.5.683 

      Warren WH Jr, Young DS, Lee DN. Visual control of step length during running over irregular terrain. J Exp Psychol Hum Percept Perform. 1986 Aug;12(3):259-66. doi: 10.1037//0096-1523.12.3.259. PMID: 2943854. 

      We have added these references to the introduction.

      Minor point 

      Related to (2) above, the path selection results are sometimes expressed a bit convolutedly, and the gist can get lost in the technical vocabulary. The generation of alternative "paths" and comparison of their slope and tortuousness parameters show that the participants preferred smaller slope/shorter paths. So, as far as I can tell, what this says is that in rugged terrain people like paths that are as "flat" as possible. This is common sense so hardly surprising. Do not be afraid to say so, and to express the result in plain non-technical terms. That an apple falls from a tree is common sense and hardly surprising. Yet quantifying the phenomenon, and carefully assessing the parameters of the path that the apple takes, turned out to be scientifically valuable - even if the observation itself lacked "novelty". 

      Thanks.  We have tried to clarify the methods/results with this in mind.

    2. eLife assessment

      This fundamental study has the potential to substantially advance our understanding of human locomotion in complex real-world settings and opens up new approaches to studying (visually guided) behavior in natural settings outside the lab. The evidence supporting the conclusions is overall compelling. Whereas detailed analyses represent multiple ways to visualize and quantify the rich and complex natural behavior, some of the specific conclusions remain more suggestive at this point. The work will be of interest to neuroscientists, kinesiologists, computer scientists, and engineers working on human locomotion.

    3. Reviewer #1 (Public review):

      Summary:

      The work of Muller and colleagues concerns the question where we place our feet when passing uneven terrain, in particular how we trade-off path length against the steepness of each single step. The authors find that paths are chosen that are consistently less steep and deviate from the straight line more than an average random path, suggesting that participants indeed trade off steepness for path length. They show that this might be related to biomechanical properties, specifically the leg length of the walkers. In addition, they show using a neural network model that participants could choose the footholds based on their sensory (visual) information about depth.

      Strengths:

      The work is a natural continuation of some of the researchers' earlier work that related the immediately following steps to gaze. Methodologically, the work is very impressive and presents a further step forward towards understanding real-world locomotion and its interaction with sampling visual information. While some of the results may seem somewhat trivial in hindsight (as always in this kind of studies), I still think this is a very important approach to understand locomotion in the wild better.

      Weaknesses:

      The concerns I had regarding the initial version of the manuscript have all been fixed in the current one.

    4. Reviewer #2 (Public review):

      This manuscript examines how humans walk over uneven terrain and use vision to decide where to step. There is a huge lack of evidence about this because the vast majority of locomotion studies have focused on steady, well-controlled conditions, and not on decisions made in the real world. The author team has already made great advances in this topic by pioneering gaze recordings during locomotion, but there has been no practical way to map the gaze targets, specifically the 3D terrain features in naturalistic environments. The team has now developed a way to integrate such measurements along with gaze and step tracking. This allows quantitative evaluation of the proposed trade-offs between stepping vertically onto vs. stepping around obstacles, along with how far people look to decide where to step. The team also introduces several new analysis techniques to accompany these measurements. They use machine learning techniques to examine whether retinocentric depth helps predict footholds and develop simulations to assess possible alternative footholds and walking paths. The technical achievement is impressive.

      This study addresses several real-world questions not normally examined in the laboratory. First, do humans elect to walk around steeper footholds rather than over them? Second, is there a quantifiable benefit to walking around, such as allowing for a flatter path? Third, does visual depth of terrain contribute to selection of footholds? Fourth, are there scale effects, where for example a tall adult can easily walk over an obstacle that a toddler must walk around. One might superficially answer yes to all of these questions, but it is highly nontrival to answer them quantitatively. As for the conclusions, my feelings are mixed. I find strengths in answers to two of the questions, and weaknesses in the other two.

      Strengths:

      I consider the evidence strongest for the first of the main questions. The results show subjects walking with more laterally deviating paths, measured by a quantity called "tortuosity," when the direct straight-ahead paths appear to have steeper ups and downs (Fig. 9). The measure of straight-ahead steepness is fairly complicated (discussed below), but is shown to be well correlated with tortuosity, effectively predicting when subjects will not walk straight ahead.

      There is also good evidence for the third question, showing that retinocentric depth is predictive of chosen footholds. Retinocentric depth was computed by a series of steps, starting with scene capture to determine a 3D terrain mesh, projecting that mesh into the eye's perspective, and then discarding all but the depth information. This highly involved process is only the beginning, because the depth was then used to train a neural network classifier with chosen footholds. That network was found to predict footholds better than chance, using a test set independent from the training set, each using half the recorded data. The results are strong and are best interpreted along with a previous study (Bonnen et al. 2021) showing that subjects gaze nearer ahead on rougher terrain, and slightly more so when binocular vision was disrupted. Depth information seems important for foothold selection.

      As an aside, humans presumably also select footholds and estimate depth from a number of monocular visual cues, such as shading, shadows, color, and self-motion information. Interestingly, the terrain mesh and depth data here were computed from monocular images, suggesting that monocular vision can in principle be predictive of both depth and footholds. Binocular human vision presumably improves on monocular depth estimation, and so it would be interesting to see whether binocular scene cameras would predict footholds better. In an earlier review, I had suggested other avenues for exploration, but these are not weaknesses so much as opportunities not yet taken. I believe much could be learned from deeper analysis of the neural network, and future experiments using variations of this technique.

      There is much to be appreciated about this study. I was impressed by the overarching outlook and ambitiousness of the team. They seek to understand human decision-making in real-world locomotion tasks, a topic of obvious relevance to the human condition but not often examined in research. The field has been biased toward well-controlled, laboratory studies, which have undeniable scientific advantages but are also quite different from the real world. The present study discards all of the usual advantages of the laboratory, yet still finds a way to explore real-world behaviors in a quantitative manner. It is an exciting and forward-thinking approach, used to tackle an ecologically relevant question.

      I also appreciate the numerous technical challenges of this study. The state of the art in real-world locomotion studies has largely been limited to kinematic motion capture. This team managed to collect and analyze an unprecedented, one-of-a-kind dataset. They applied a number of non-trivial methods to assess retinocentric depth, simulate would-be walking paths and steepness, and predict footholds from neural network. Any of these could and probably will merit individual papers, and to assemble them all at once is quite beyond other studies I am aware of. I hope this study will spur more inquiries of this type, leveraging mobile electronics and modern machine learning techniques to answer questions that were previously only addressable qualitatively.

      Weaknesses:<br /> Although I am highly enthusiastic about this study, I was not entirely convinced by the evidence for the second and fourth questions. Some of this is because I was confused by aspects of the analysis, limiting my understanding of the evidence. But I also question some of the basic conclusions, whether the authors indeed proved that (from Abstract, emphasis mine) "[walkers] change direction TO AVOID taking steeper steps that involve large height changes, instead of [sic] choosing more circuitous, RELATIVELY FLAT paths." (I interpret the "of" as a typo that should have been omitted.) I think it is more objective to say, "walkers changed direction more when straight-ahead paths seemed to have steeper height changes."

      I say "seemed" because it is unknown whether humans would have experienced greater height changes if they walked straight ahead (the second main question). The comparison shown is between human tortuous paths taken and simulated straight-ahead paths never experienced by human. Ignoring questions about the simulations for now (discussed below), it is not an apples-to-apples comparison, say between the tortuous paths humans preferred and straight-ahead paths they didn't. The authors determined a measure of steepness, "straight path slope" (Fig. 9), that predicts when humans circuitously, but that is the same as the steepness that humans would actually experience if they had walked straight ahead. That could have been measured with an appropriate control condition, for example asking subjects to walk as straight ahead as they can manage. That also would have eliminated the need for simulations, because the slope of each step actually taken could simply have been measured and compared between conditions. Instead, two different kinds of simulations are compared, where steeper paths are fully simulated, and the circuitous paths are partially simulated but partially based on data. It seems that every fifth circuitous step coincides with a human foothold, but the intervening ones are somewhat random. I don't find this especially strong evidence that the chosen paths were indeed relatively flatter. I would prefer to be convinced by hard data than by unequal simulations.

      I also have trouble accepting "TO AVOID" because it implies a degree of intent not evident in the data. I suppose conscious intent could be assessed subjectively by questionnaire, but I don't know how unconscious intent could be tested objectively. I believe my suggested interpretation above is better supported by evidence.

      My limited acceptance is due in part to confusion about the simulations. I was especially confused about the connection between feasible steps drawn from the distribution in Figure 7, and the histograms of Figure 8. The feasible steps have clear peaks near zero slope, unity step length, and zero step direction (let's call them Flat). If 5-step simulations of Figure 8 draw from that distribution, why is there zero probability for the 0-3 deg bin (which is within {plus minus}3 deg due to absolute values)? It seems to me that Flat steps were eminently available, so why were they completely avoided? It seems that the simulations were probabilistic (and not just figurative) random walks, which implies they should have had about the same mean as Figure 7 but a wider variance, and then passed through absolute value. They look like something else that I cannot understand. This is important because the RELATIVELY FLAT conclusion is based on the chosen walks apparently being skewed flatter than random simulated walks. I have trouble accepting those distributions because Flat steps were unaccountably never taken by either simulation or human. (This issue is less concerning for Figure 9, because one can accept that some simulation measure is predictive of tortuosity even if the measure is hard to understand.)

      I was also confused why Figure 7 distances and directions are nearly normally distributed and not more uniform. The methods only mention constraints to eliminate steps, which to me suggests a truncated uniform distribution. It is not clear to me why the terrain should have a high peak at unity step length, which implies that the only feasible footholds were almost exclusively straight ahead and one step length away. It is possible that the "feasible" footholds are themselves drawn from a "likely" normal distribution, perhaps based on level walking data. It could be argued that simulated steps should be performed by drawing from typical step distributions for level ground, eliminating non-viable footholds, and then repeating that across multiple steps. That would explain the normality, but it is not stated in the Methods, and even if they were "feasible and likely" it would not explain the distributions of Figure 8.

      I had some misgivings about the fourth question, where Figure 10 suggests that shorter subjects had greater correlation between straight-path slope and tortuosity than taller ones, who tended to walk straighter ahead. I agree with the authors' rebuttal to my previous review that "the data are the data" but I still have doubts. Now supplied as suggested by another reviewer, Figure 18 provides more detail of the underlying data, with considerably lower correlations. I now suspect that Figure 10 benefits from some statistical artifacts due to binning and other operations, and the weaker correlations of Fig. 18A are closer to reality. I am rather suspicious of correlations of correlations (Figure 18B), which lose some statistical grounding because the second correlation treats all data on equal footing, effectively whitewashing the first correlations of their varying significance (p-values 0.008 to 1e-9).

      Furthermore, I am also unsure about Figure 10's comparison of tortuosity vs. straight path slope against leg length. Both tortuosity and straight path slope are already effectively dimensionless and therefore already seem to eliminate scale. It is my understanding that the simulated paths were recomputed for each subject's parameters, and the horizontal axis, slope, is already an angular measure that should affect short and tall people similarly. Shouldn't all subjects equally avoid steep angles, regardless of their dimensional height? If there is indeed a scale effect, then I would expect it to be demonstrated with a dimensional measure (vertical axis) that depends on leg length.

      I certainly agree with the hypothetical prior that tall adults walk straight over obstacles that shorter adults (or children) walk around. But I feel that simpler tests would better evidence, perhaps in future work. Did shorter subjects walk with greater tortuosity than taller ones on the same terrain? Did shorter subjects take relatively more steps even after normalizing for leg length? A possible comparison would be (number of steps)*(leg length)/(start to end distance). I feel that the evidence from this study is not that strong.

      Although it is a strength of this study that so much can be learned from pure observation, that does not mean controlled conditions are not scientifically helpful. As mentioned earlier, a helpful control could have been to ask subjects to walk straighter but less preferred paths on the same terrain, treating human paths as an independent variable. Another would be to treat terrain as an independent variable, by using level ground and intermediate terrain conditions. This would make it easier to test whether taller subjects walk straighter ahead on more uneven terrain than shorter subjects. Indeed, the data set already includes some patches of flatter terrain, not included here. Additional and simpler tests might be possible based on existing data.

      Conclusion

      This is an ambitious undertaking, presenting a wealth of unprecedented data to quantitatively test basic ecological questions that have long been unanswered. There are a number of considerable strengths that merit appreciation, especially the ability to quantitatively predict when humans will walk more circuitously. The weaknesses are about limitations in the conclusions that can be drawn thus far rather than the correctness of the study. I consider this to be a first step that will hopefully enable and inspire a long line of future work that will address these questions more in depth.

    5. Reviewer #3 (Public review):

      Summary:

      The systematic way in which path selection is parametrically investigated is the main contribution.

      Strengths:

      The authors have developed an impressive workflow to study gait and gaze in natural terrain. They are able to determine footholds and gaze points in the 3D world, and explore different path selections in the terrain.

      Weaknesses:

      The finding that walkers prefer less tortuous, demanding paths is hardly surprising, and from the data it is still not clear what actual visual features are used to choose among alternative routes or what the nature of the decision process is. The authors discuss energetic cost and other "factors" that might influence path selection, but as yet there is no way to express these ideas rigorously in such complex natural settings.

    1. Reviewer #1 (Public review):

      In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with a rich dataset and solid methodology.

      The revisions made by the authors in this version have greatly improved the validity and clarity of the statistical techniques, and as a result the paper's findings are more convincing.

      This paper's primary strengths are: 1) its comprehensive dataset that allows for a snapshot of the dynamics of several related fields; 2) its thorough exploration of how self-citation behavior relates to characteristics of research and researchers.

      Its primary weakness is that the study stops short of digging into potential mechanisms in areas where it is potentially feasible to do so - for example, studying international dynamics by identifying and studying researchers who move between countries, or quantifying more or less 'appropriate' self-citations via measures of abstract text similarity.

      Yet while these types of questions were not determined to be in scope for this paper, the study is quite effective at laying the important groundwork for further study of mechanisms and motivations, and will be a highly valuable resource for both scientists within the field and those studying it.

    2. Reviewer #2 (Public review):

      The study presents valuable findings on self-citation rates in the field of Neuroscience, shedding light on potential strategic manipulation of citation metrics by first authors, regional variations in citation practices across continents, gender differences in early-career self-citation rates, and the influence of research specialization on self-citation rates in different subfields of Neuroscience. While some of the evidence supporting the claims of the authors is solid, some of the analysis seems incomplete and would benefit from more rigorous approaches.

    3. Reviewer #3 (Public review):

      This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. The interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated.

      This issue of interpretability was already raised in my review of the previous revision, where I argued that the authors should take a more explicit causal framework. The authors have now revised some of the language in this revision, in order to downplay causal language. Although this is perfectly fine, this misses the broader point, namely that it is not clear what is being estimated. Perhaps it is best to refer to Lundberg et al. (2021) and ask the authors to clarify "What is your Estimand?" In my view, the theoretical estimands the authors are interested in are causal in nature. Perhaps the authors would argue that their estimands are descriptive. In either case, it would be good if the authors could clarify that theoretical estimand.

      Finally, in my previous review, I raised the issue of when self-citations become "problematic". The authors have addressed this issue satisfactorily, I believe, and now formulate their conclusions more carefully.

      Lundberg, I., Johnson, R., & Stewart, B. M. (2021). What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory. American Sociological Review, 86(3), 532-565. https://doi.org/10.1177/00031224211004187

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with an appropriate and well-structured dataset.

      The study's descriptive analyses and figures are useful and will be of interest to the neuroscience community. However, with regard to the statistical comparisons and regression models, I believe that there are methodological flaws that may limit the validity of the presented results. These issues primarily affect the uncertainty of estimates and the statistical inference made on comparisons and model estimates - the fundamental direction and magnitude of the results are unlikely to change in most cases. I have included detailed statistical comments below for reference.

      Conceptually, I think this study will be very effective at providing context and empirical evidence for a broader conversation around self-citation. And while I believe that there is room for a deeper quantitative dive into some finer-grained questions, this paper will be a valuable catalyst for new areas of inquiry around citation behavior - e.g., do authors change self-citation behavior when they move to more or less prestigious institutions? do self-citations in neuroscience benefit downstream citation accumulation? do journals' reference list policies increase or decrease self-citation? - that I hope that the authors (or others) consider exploring in future work.

      Thank you for your suggestions and your generally positive view of our work. As described below, we have made the statistical improvements that you suggested.

      Statistical comments:

      (1) Throughout the paper, the nested nature of the data does not seem to be appropriately handled in the bootstrapping, permutation inference, and regression models. This is likely to lead to inappropriately narrow confidence bands and overly generous statistical inference.

      We apologize for this error. We have now included nested bootstrapping and permutation tests. We defined an “exchangeability block” as a co-authorship group of authors. In this dataset, that meant any authors who published together (among the articles in this dataset) as a First Author / Last Author pairing were assigned to the same exchangeability block. It is not realistic to check for overlapping middle authors in all papers because of the collaborative nature of the field. In addition, we believe that self-citations are primarily controlled by first and last authors, so we can assume that middle authors do not control self-citation habits. We then performed bootstrapping and permutation tests in the constraints of the exchangeability blocks.

      We first describe this in the results (page 3, line 110):

      “Importantly, we accounted for the nested structure of the data in bootstrapping and permutation tests by forming co-authorship exchangeability blocks.”

      We also describe this in 4.8 Confidence Intervals (page 21, line 725):

      “Confidence intervals were computed with 1000 iterations of bootstrap resampling at the article level. For example, of the 100,347 articles in the dataset, we resampled articles with replacement and recomputed all results. The 95% confidence interval was reported as the 2.5 and 97.5 percentiles of the bootstrapped values.

      We grouped data into exchangeability blocks to avoid overly narrow confidence intervals or overly optimistic statistical inference. Each exchangeability block comprised any authors who published together as a First Author / Last Author pairing in our dataset. We only considered shared First/Last Author publications because we believe that these authors primarily control self-citations, and otherwise exchangeability blocks would grow too large due to the highly collaborative nature of the field. Furthermore, the exchangeability blocks do not account for co-authorship in other journals or prior to 2000. A distribution of the sizes of exchangeability blocks is presented in Figure S15.”

      In describing permutation tests, we also write (page 21, line 739):

      “4.9 P values

      P values were computed with permutation testing using 10,000 permutations, with the exception of regression P values and P values from model coefficients. For comparing different fields (e.g., Neuroscience and Psychiatry) and comparing self-citation rates of men and women, the labels were randomly permuted by exchangeability block to obtain null distributions. For comparing self-citation rates between First and Last Authors, the first and last authorship was swapped in 50% of exchangeability blocks.”

      For modeling, we considered doing a mixed effects model but found difficulties due to computational power. For example, with our previous model, there were hundreds of thousands of levels for the paper random effect, and tens of thousands of levels for the author random effect. Even when subsampling or using packages designed for large datasets (e.g., mgcv’s bam function: https://www.rdocumentation.org/packages/mgcv/versions/1.9-1/topics/bam), we found computational difficulties.

      As a result, we switched to modeling results at the paper level (e.g., self-citation count or rate). We found that results could be unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. We updated our description of our models in the Methods section (page 21, line 754):

      “4.10 Exploring effects of covariates with generalized additive models

      For these analyses, we used the full dataset size separately for First and Last Authors (Table S2). This included 115,205 articles and 5,794,926 citations for First Authors, and 114,622 articles and 5,801,367 citations for Last Authors. We modeled self-citation counts, self-citation rates, and number of previous papers for First Authors and Last Authors separately, resulting in six total models.

      We found that models could be computationally intensive and unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. The random resampling was repeated 100 times as a sensitivity analysis (Figure S12).

      For our models, we used generalized additive models from mgcv’s “gam” function in R 49. The smooth terms included all the continuous variables: number of previous papers, academic age, year, time lag, number of authors, number of references, and journal impact factor. The linear terms included all the categorical variables: field, gender affiliation country LMIC status, and document type. We empirically selected a Tweedie distribution 50 with a log link function and p=1.2. The p parameter indicates that the variance is proportional to the mean to the p power 49. The p parameter ranges from 1-2, with p=1 equivalent to the Poisson distribution and p=2 equivalent to the gamma distribution. For all fitted models, we simulated the residuals with the DHARMa package, as standard residual plots may not be appropriate for GAMs 51. DHARMa scales the residuals between 0 and 1 with a simulation-based approach 51. We also tested for deviation from uniformity, dispersion, outliers, and zero inflation with DHARMa. Non-uniformity, dispersion, outliers, and zero inflation were significant due to the large sample size, but small in effect size in most cases. The simulated quantile-quantile plots from DHARMa suggested that the observed and simulated distributions were generally aligned, with the exception of slight misalignment in the models for the number of previous papers. These analyses are presented in Figure S11 and Table S7.

      In addition, we tested for inadequate basis functions using mgcv’s “gam.check()” function 49. Across all smooth predictors and models, we ultimately selected between 10-20 basis functions depending on the variable and outcome measure (counts, rates, papers). We further checked the concurvity of the models and ensured that the worst-case concurvity for all smooth predictors was about 0.8 or less.”

      The direction of our results primarily stayed the same, with the exception of gender results. Men tended to self-cite slightly less (or equal self-citation rates) after accounting for numerous covariates. As such, we also modeled the number of previous papers to explain the discrepancy between our raw data and the modeled gender results. Please find the updated results text below (page 11, line 316):

      “2.9 Exploring effects of covariates with generalized additive models

      Investigating the raw trends and group differences in self-citation rates is important, but several confounding factors may explain some of the differences reported in previous sections. For instance, gender differences in self-citation were previously attributed to men having a greater number of prior papers available to self-cite 7,20,21. As such, covarying for various author- and article-level characteristics can improve the interpretability of self-citation rate trends. To allow for inclusion of author-level characteristics, we only consider First Author and Last Author self-citation in these models.

      We used generalized additive models (GAMs) to model the number and rate of self-citations for First Authors and Last Authors separately. The data were randomly subsampled so that each author only appeared in one paper. The terms of the model included several article characteristics (article year, average time lag between article and all cited articles, document type, number of references, field, journal impact factor, and number of authors), as well as author characteristics (academic age, number of previous papers, gender, and whether their affiliated institution is in a low- and middle-income country). Model performance (adjusted R2) and coefficients for parametric predictors are shown in Table 2. Plots of smooth predictors are presented in Figure 6.

      First, we considered several career and temporal variables. Consistent with prior works 20,21, self-citation rates and counts were higher for authors with a greater number of previous papers. Self-citation counts and rates increased rapidly among the first 25 published papers but then more gradually increased. Early in the career, increasing academic age was related to greater self-citation. There was a small peak at about five years, followed by a small decrease and a plateau. We found an inverted U-shaped trend for average time lag and self-citations, with self-citations peaking approximately three years after initial publication. In addition, self-citations have generally been decreasing since 2000. The smooth predictors showed larger decreases in the First Author model relative to the Last Author model (Figure 6).

      Then, we considered whether authors were affiliated with an institution in a low- and middle-income country (LMIC). LMIC status was determined by the Organisation for Economic Co-operation and Development. We opted to use LMIC instead of affiliation country or continent to reduce the number of model terms. We found that papers from LMIC institutions had significantly lower self-citation counts (-0.138 for First Authors, -0.184 for Last Authors) and rates (-12.7% for First Authors, -23.7% for Last Authors) compared to non-LMIC institutions. Additional results with affiliation continent are presented in Table S5. Relative to the reference level of Asia, higher self-citations were associated with Africa (only three of four models), the Americas, Europe, and Oceania.

      Among paper characteristics, a greater number of references was associated with higher self-citation counts and lower self-citation rates (Figure 6). Interestingly, self-citations were greater for a small number of authors, though the effect diminished after about five authors. Review articles were associated with lower self-citation counts and rates. No clear trend emerged between self-citations and journal impact factor. In an analysis by field, despite the raw results suggesting that self-citation rates were lower in Neuroscience, GAM-derived self-citations were greater in Neuroscience than in Psychiatry or Neurology.

      Finally, our results aligned with previous findings of nearly equivalent self-citation rates for men and women after including covariates, even showing slightly higher self-citation rates in women. Since raw data showed evidence of a gender difference in self-citation that emerges early in the career but dissipates with seniority, we incorporated two interaction terms: one between gender and academic age and a second between gender and the number of previous papers. Results remained largely unchanged with the interaction terms (Table S6).

      2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      (2) The discussion of the data structure used in the regression models is somewhat opaque, both in the main text and the supplement. From what I gather, these models likely have each citation included in the model at least once (perhaps twice, once for first-author status and one for last-author status), with citations nested within citing papers, cited papers, and authors. Without inclusion of random effects, the interpretation and inference of the estimates may be misleading.

      Please see our response to point (1) to address random effects. We have also switched to GAMs (see point #3 below) and provided more detail in the methods. Notably, we decided against using author-level effects due to poor model stability, as there can be as few as one author per group. Instead, we subsampled the dataset such that only one paper appeared from each author.

      (3) I am concerned that the use of the inverse hyperbolic sine transform is a bit too prescriptive, and may be producing poor fits to the true predictor-outcome relationships. For example, in a figure like Fig S8, it is hard to know to what extent the sharp drop and sign reversal are true reflections of the data, and to what extent they are artifacts of the transformed fit.

      Thank you for raising this point. We have now switched to using generalized additive models (GAMs). GAMs provide a flexible approach to modeling that does not require transformations. We described this in detail in point (1) above and in Methods 4.10 Exploring effects of covariates with generalized additive models (page 21, line 754).

      “4.10 Exploring effects of covariates with generalized additive models

      For these analyses, we used the full dataset size separately for First and Last Authors (Table S2). This included 115,205 articles and 5,794,926 citations for First Authors, and 114,622 articles and 5,801,367 citations for Last Authors. We modeled self-citation counts, self-citation rates, and number of previous papers for First Authors and Last Authors separately, resulting in six total models.

      We found that models could be computationally intensive and unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. The random resampling was repeated 100 times as a sensitivity analysis (Figure S12).

      For our models, we used generalized additive models from mgcv’s “gam” function in R 48. The smooth terms included all the continuous variables: number of previous papers, academic age, year, time lag, number of authors, number of references, and journal impact factor. The linear terms included all the categorical variables: field, gender affiliation country LMIC status, and document type. We empirically selected a Tweedie distribution 49 with a log link function and p=1.2. The p parameter indicates that the variance is proportional to the mean to the p power 48. The p parameter ranges from 1-2, with p=1 equivalent to the Poisson distribution and p=2 equivalent to the gamma distribution. For all fitted models, we simulated the residuals with the DHARMa package, as standard residual plots may not be appropriate for GAMs 50. DHARMa scales the residuals between 0 and 1 with a simulation-based approach 50. We also tested for deviation from uniformity, dispersion, outliers, and zero inflation with DHARMa. Non-uniformity, dispersion, outliers, and zero inflation were significant due to the large sample size, but small in effect size in most cases. The simulated quantile-quantile plots from DHARMa suggested that the observed and simulated distributions were generally aligned, with the exception of slight misalignment in the models for the number of previous papers. These analyses are presented in Figure S11 and Table S7.

      In addition, we tested for inadequate basis functions using mgcv’s “gam.check()” function 48. Across all smooth predictors and models, we ultimately selected between 10-20 basis functions depending on the variable and outcome measure (counts, rates, papers). We further checked the concurvity of the models and ensured that the worst-case concurvity for all smooth predictors was about 0.8 or less.”

      (4) It seems there are several points in the analysis where papers may have been dropped for missing data (e.g., missing author IDs and/or initials, missing affiliations, low-confidence gender assessment). It would be beneficial for the reader to know what % of the data was dropped for each analysis, and for comparisons across countries it would be important for the authors to make sure that there is not differential missing data that could affect the interpretation of the results (e.g., differences in self-citation being due to differences in Scopus ID coverage).

      Thank you for raising this important point. In the methods section, we describe how the data are missing (page 18, line 623):

      “4.3 Data exclusions and missingness

      Data were excluded across several criteria: missing covariates, missing citation data, out-of-range values at the citation pair level, and out-of-range values at the article level (Table 3). After downloading the data, our dataset included 157,287 articles and 8,438,733 citations. We excluded any articles with missing covariates (document type, field, year, number of authors, number of references, academic age, number of previous papers, affiliation country, gender, and journal). Of the remaining articles, we dropped any for missing citation data (e.g., cannot identify whether a self-citation is present due to lack of data). Then, we removed citations with unrealistic or extreme values. These included an academic age of less than zero or above 38/44 for First/Last Authors (99th percentile); greater than 266/522 papers for First/Last Authors (99th percentile); and a cited year before 1500 or after 2023. Subsequently, we dropped articles with extreme values that could contribute to poor model stability. These included greater than 30 authors; fewer than 10 references or greater than 250 references; and a time lag of greater than 17 years. These values were selected to ensure that GAMs were stable and not influenced by a small number of extreme values.

      In addition, we evaluated whether the data were not missing at random (Table S8). Data were more likely to be missing for reviews relative to articles, for Neurology relative to Neuroscience or Psychiatry, in works from Africa relative to the other continents, and for men relative to women. Scopus ID coverage contributed in part to differential missingness. However, our exclusion criteria also contribute. For example, Last Authors with more than 522 papers were excluded to help stabilize our GAMs. More men fit this exclusion criteria than women.”

      Due to differential missingness, we wrote in the limitations (page 16, line 529):

      “Ninth, data were differentially missing (Table S8) due to Scopus coverage and gender estimation. Differential missingness could bias certain results in the paper, but we hope that the dataset is large enough to reduce any potential biases.”

      Reviewer #2 (Public Review):

      The authors provide a comprehensive investigation of self-citation rates in the field of Neuroscience, filling a significant gap in existing research. They analyze a large dataset of over 150,000 articles and eight million citations from 63 journals published between 2000 and 2020. The study reveals several findings. First, they state that there is an increasing trend of self-citation rates among first authors compared to last authors, indicating potential strategic manipulation of citation metrics. Second, they find that the Americas show higher odds of self-citation rates compared to other continents, suggesting regional variations in citation practices. Third, they show that there are gender differences in early-career self-citation rates, with men exhibiting higher rates than women. Lastly, they find that self-citation rates vary across different subfields of Neuroscience, highlighting the influence of research specialization. They believe that these findings have implications for the perception of author influence, research focus, and career trajectories in Neuroscience.

      Overall, this paper is well written, and the breadth of analysis conducted by authors, with various interactions between variables (eg. gender vs. seniority), shows that the authors have spent a lot of time thinking about different angles. The discussion section is also quite thorough. The authors should also be commended for their efforts in the provision of code for the public to evaluate their own self-citations. That said, here are some concerns and comments that, if addressed, could potentially enhance the paper:

      Thank you for your review and your generally positive view of our work.

      (1) There are concerns regarding the data used in this study, specifically its bias towards top journals in Neuroscience, which limits the generalizability of the findings to the broader field. More specifically, the top 63 journals in neuroscience are based on impact factor (IF), which raises a potential issue of selection bias. While the paper acknowledges this as a limitation, it lacks a clear justification for why authors made this choice. It is also unclear how the "top" journals were identified as whether it was based on the top 5% in terms of impact factor? Or 10%? Or some other metric? The authors also do not provide the (computed) impact factors of the journals in the supplementary.

      We apologize for the lack of clarity about our selection of journals. We agree that there are limitations to selecting higher impact journals. However, we needed to apply some form of selection in order to make the analysis manageable. For instance, even these 63 journals include over five million citations. We better describe our rationale behind the approach as follows (page 17, line 578):

      “We collected data from the 25 journals with the highest impact factors, based on Web of Science impact factors, in each of Neurology, Neuroscience, and Psychiatry. Some journals appeared in the top 25 list of multiple fields (e.g., both Neurology and Neuroscience), so 63 journals were ultimately included in our analysis. We recognize that limiting the journals to the top 25 in each field also limits the generalizability of the results. However, there are tradeoffs between breadth of journals and depth of information. For example, by limiting the journals to these 63, we were able to look at 21 years of data (2000-2020). In addition, the definition of fields is somewhat arbitrary. By restricting the journals to a set of 63 well-known journals, we ensured that the journals belonged to Neurology, Neuroscience, or Psychiatry research. It is also important to note that the impact factor of these journals has not necessarily always been high. For example, Acta Neuropathologica had an impact factor of 17.09 in 2020 but 2.45 in 2000. To further recognize the effects of impact factor, we decided to include an impact factor term in our models.”

      In addition, we have now provided the 2020 impact factors in Table S1.

      By exclusively focusing on high impact journals, your analysis may not be representative of the broader landscape of self-citation patterns across the neuroscience literature, which is what the title of the article claims to do.

      We agree that this article is not indicative of all neuroscience literature, but rather the top journals. Thus, we have changed the title to: “Trends in Self-citation Rates in High-impact Neurology, Neuroscience, and Psychiatry Journals”. We would also like to note that compared to previous bibliometrics works in neuroscience (Bertolero et al. 2020; Dworkin et al. 2020; Fulvio et al. 2021), this article includes a wider range of data.

      (2) One other concern pertains to the possibility that a significant number of authors involved in the paper may not be neuroscientists. It is plausible that the paper is a product of interdisciplinary collaboration involving scientists from diverse disciplines. Neuroscientists amongst the authors should be identified.

      In our opinion, neuroscience is a broad, interdisciplinary field. Individuals performing neuroscience research may have a neuroscience background. Yet, they may come from many backgrounds, such as physics, mathematics, biology, chemistry, or engineering. As such, we do not believe that it is feasible to characterize whether each author considers themselves a neuroscientist or not. We have added the following to the limitations section (page 16, line 528):

      “Eighth, authors included in this work may not be neurologists, neuroscientists, or psychiatrists. However, they still publish in journals from these fields.”

      (3) When calculating self-citation rate, it is important to consider the number of papers the authors have published to date. One plausible explanation for the lower self-citation rates among first authors could be attributed to their relatively junior status and short publication record. As such, it would also be beneficial to assess self-citation rate as a percentage relative to the author's publication history. This number would be more accurate if we look at it as a percentage of their publication history. My suspicion is that first authors (who are more junior) might be more likely to self-cite than their senior counterparts. My suspicion was further raised by looking at Figures 2a and 3. Considering the nature of the self-citation metric employed in the study, it is expected that authors with a higher level of seniority would have a greater number of publications. Consequently, these senior authors' papers are more likely to be included in the pool of references cited within the paper, hence the higher rate.

      While the authors acknowledge the importance of the number of past publications in their gender analysis, it is just as important to include the interplay of seniority in (1) their first and last author self-citation rates and (2) their geographic analysis.

      Thank you for this thoughtful comment. We agree that seniority and prior publication history play an important role in self-citation rates.

      For comparing First/Last Author self-citation rates, we have now included a plot similar to Figure 2a, where self-citation as a percentage of prior publication history is plotted.

      (page 4, line 161): “Analyzing self-citations as a fraction of publication history exhibited a similar trend (Figure S3). Notably, First Authors were more likely than Last Authors to self-cite when normalized by prior publication history.

      For the geographic analysis, we made two new maps: 1) that of the number of previous papers, and 2) that of the journal impact factor (see response to point #4 below).

      (page 5, line 185): “We also investigated the distribution of the number of previous papers and journal impact factor across countries (Figure S4). Self-citation maps by country were highly correlated with maps of the number of previous papers (Spearman’s r\=0.576, P=4.1e-4; 0.654, P=1.8e-5 for First and Last Authors). They were significantly correlated with maps of average impact factor for Last Authors (0.428, P=0.014) but not Last Authors (Spearman’s r\=0.157, P=0.424). Thus, further investigation is necessary with these covariates in a comprehensive model.”

      Finally, we included a model term for the number of previous papers (Table 2). We analyzed this both for self-citation counts and self-citation rates and found a strong relationship between publication history and self-citations. We also included the following section where we modeled the number of previous papers for each author (page 13, line 384):

      “2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      (4) Because your analysis is limited to high impact journals, it would be beneficial to see the distribution of the impact factors across the different countries. Otherwise, your analysis on geographic differences in self-citation rates is hard to interpret. Are these differences really differences in self-citation rates, or differences in journal impact factor? It would be useful to look at the representation of authors from different countries for different impact factors.

      We made a map of this in Figure S4 (see our response to point #3 above).

      (page 5, line 185): “We also investigated the distribution of the number of previous papers and journal impact factor across countries (Figure S4). Self-citation maps by country were highly correlated with maps of the number of previous papers (Spearman’s r=0.576, P=4.1e-4; 0.654, P=1.8e-5 for First and Last Authors). They were significantly correlated with maps of average impact factor for Last Authors (0.428, P=0.014) but not Last Authors (Spearman’s r=0.157, P=0.424). Thus, further investigation is necessary with these covariates in a comprehensive model.”

      We also included impact factor as a term in our model. The results suggest that there are still geographic differences (Table 2, Table S5).

      (5) The presence of self-citations is not inherently problematic, and I appreciate the fact that authors omit any explicit judgment on this matter. That said, without appropriate context, self-citations are also not the best scholarly practice. In the analysis on gender differences in self-citations, it appears that authors imply an expectation of women's self-citation rates to align with those of men. While this is not explicitly stated, use of the word "disparity", and also presentation of self-citation as an example of self-promotion in discussion suggest such a perspective. Without knowing the context in which the self-citation was made, it is hard to ascertain whether women are less inclined to self-promote or that men are more inclined to engage in strategic self-citation practices.

      We agree that on the level of an individual self-citation, our study is not useful for determining how related the papers are. Yet, understanding overall trends in self-citation may help to identify differences. Context is important, but large datasets allow us to investigate broad trends. We added the following text to the limitations section (page 16, line 524):

      “In addition, these models do not account for whether a specific citation is appropriate, as some situations may necessitate higher self-citation rates.”

      Reviewer #3 (Public Review):

      This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. There are some minor methodological clarifications needed, but more importantly, the interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated, and more importantly, the extent to which self-citations are "problematic" remains unclear.

      Thank you for your review. We attempted to improve the interpretation of results, as described in the following responses.

      When are self-citations problematic? As the authors themselves also clarify, "self-citations may often be appropriate". Researchers cite their own previous work for perfectly good reasons, similar to reasons of why they would cite work by others. The "problem", in a sense, is that researchers cite their own work, just to increase the citation count, or to promote their own work and make it more visible. This self-promotional behaviour might be incentivised by certain research evaluation procedures (e.g. hiring, promoting) that overly emphasise citation performance. However, the true problem then might not be (self-)citation practices, but instead, the flawed research evaluation procedures that emphasis citation performance too much. So instead of problematising self-citation behaviour, and trying to address it, we might do better to address flawed research evaluation procedures. Of course, we should expect references to be relevant, and we should avoid self-promotional references, but addressing self-citations may just have minimal effects, and would not solve the more fundamental issue.

      We agree that this dataset is not designed to investigate the downstream effects of self-citations. However, self-citation practices are more likely to be problematic when they differ across specific groups. This work can potentially spark more interest in future longitudinal designs to investigate whether differences in self-citation practices leads to differences in career outcomes, for example. We added the following text to clarify (page 17, line 565):

      “Yet, self-citation practices become problematic when they are different across groups or are used to “game the system.” Future work should investigate the downstream effects of self-citation differences to see whether they impact the career trajectories of certain groups. We hope that this work will help to raise awareness about factors influencing self-citation practices to better inform authors, editors, funding agencies, and institutions in Neurology, Neuroscience, and Psychiatry.”

      Some other challenges arise when taking a statistical perspective. For any given paper, we could browse through the references, and determine whether a particular reference would be warranted or not. For instance, we could note that there might be a reference included that is not at all relevant to the paper. Taking a broader perspective, the irrelevant reference might point to work by others, included just for reasons of prestige, so-called perfunctory citations. But it could of course also include self-citations. When we simply start counting all self-citations, we do not see what fraction of those self-citations would be warranted as references. The question then emerges, what level of self-citations should be counted as "high"? How should we determine that? If we observe differences in self-citation rates, what does it tell us?

      Our focus is when the self-citation practices differ across groups. We agree that, on a case-by-case basis, there is no exact number for a self-citation rate that is “high.” With a dataset of the current size, evaluating whether each individual self-citation is appropriate is not feasible. If we observe differences in self-citation rate, this may tell us about broad (not individual-level) trends and differences in self-citing practice. If one group is self-citing much more highly compared to another group–even after covarying relevant variables such as prior publication history–then the self-citation differences can likely be attributed to differences in self-citation practices/behaviors.

      For example, the authors find that the (any author) self-citation rate in Neuroscience is 10.7% versus 15.9% in Psychiatry. What does this difference mean? Are psychiatrists citing themselves more often than neuroscientists? First author men showed a self-citation rate of 5.12% versus a self-citation rate of 3.34% of women first authors. Do men engage in more problematic citation behaviour? Junior researchers (10-year career) show a self-citation rate of about 5% compared to a self-citation rate of about 10% for senior researchers (30-year career). Are senior researchers therefore engaging in more problematic citation behaviour? The answer is (most likely) "no", because senior authors have simply published more, and will therefore have more opportunities to refer to their own work. To be clear: the authors are aware of this, and also take this into account. In fact, these "raw" various self-citation rates may, as the authors themselves say, "give the illusion" of self-citation rates, but these are somehow "hidden" by, for instance, career seniority.

      We included numerous covariates in our model. In addition, to address the difference between “raw” and “modeled” self-citation rates, we added the following section (page 13, line 384):

      “2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      Again, the authors do consider this, and "control" for career length and number of publications, et cetera, in their regression model. Some of the previous observations then change in the regression model. Neuroscience doesn't seem to be self-citing more, there just seem to be junior researchers in that field compared to Psychiatry. Similarly, men and women don't seem to show an overall different self-citation behaviour (although the authors find an early-career difference), the men included in the study simply have longer careers and more publications.

      But here's the key issue: what does it then mean to "control" for some variables? This doesn't make any sense, except in the light of causality. That is, we should control for some variable, such as seniority, because we are interested in some causal effect. The field may not "cause" the observed differences in self-citation behaviour, this is mediated by seniority. Or is it confounded by seniority? Are the overall gender differences also mediated by seniority? How would the selection of high-impact journals "bias" estimates of causal effects on self-citation? Can we interpret the coefficients as causal effects of that variable on self-citations? If so, would we try to interpret this as total causal effects, or direct causal effects? If they do not represent causal effects, how should they be interpreted then? In particular, how should it "inform author, editors, funding agencies and institutions", as the authors say? What should they be informed about?

      We apologize for our misuse of language. We will be more clear, as in most previous self-citation papers, that our analysis is NOT causal. Causal datasets do have some benefits in citation research, but a limitation is that they may not cover as wide of a range of authors. Furthermore, non-causal correlational studies can still be useful in informing authors, editors, funding agencies, and institutions. Association studies are widely used across various fields to draw non-causal conclusions. We made numerous changes to reduce our causal language.

      Before: “We then developed a probability model of self-citation that controls for numerous covariates, which allowed us to obtain significance estimates for each variable of interest.”

      After (page 3, line 113): “We then developed a probability model of self-citation that includes numerous covariates, which allowed us to obtain significance estimates for each variable of interest.”

      Before: “As such, controlling for various author- and article-level characteristics can improve the interpretability of self-citation rate trends.”

      After (page 11, line 321): “As such, covarying various author- and article-level characteristics can improve the interpretability of self-citation rate trends.”

      Before: “Initially, it appeared that self-citation rates in Neuroscience are lower than Neurology and Psychiatry, but after controlling for various confounds, the self-citation rates are higher in Neuroscience.”

      After (page 15, line 468): “Initially, it appeared that self-citation rates in Neuroscience are lower than Neurology and Psychiatry, but after considering several covariates, the self-citation rates are higher in Neuroscience.”

      We also added the following text to the limitations section (page 16, line 526):

      “Seventh, the analysis presented in this work is not causal. Association studies are advantageous for increasing sample size, but future work could investigate causality in curated datasets.”

      The authors also "encourage authors to explore their trends in self-citation rates". It is laudable to be self-critical and review ones own practices. But how should authors interpret their self-citation rate? How useful is it to know whether it is 5%, 10% or 15%? What would be the "reasonable" self-citation rate? How should we go about constructing such a benchmark rate? Again, this would necessitate some causal answer. Instead of looking at the self-citation rate, it would presumably be much more informative to simply ask authors to check whether references are appropriate and relevant to the topic at hand.

      We believe that our tool is valuable for authors to contextualize their own self-citation rates. For instance, if an author has published hundreds of articles, it is not practical to count the number of self-citations in each. We have added two portions of text to the limitations section:

      (page 16, line 524): “In addition, these models do not account for whether a specific citation is appropriate, though some situations may necessitate higher self-citation rates.”

      (page 16, line 535): “Despite these limitations, we found significant differences in self-citation rates for various groups, and thus we encourage authors to explore their trends in self-citation rates. Self-citation rates that are higher than average are not necessarily wrong, but suggest that authors should further reflect on their current self-citation practices.”

      In conclusion, the study shows some interesting and relevant differences in self-citation rates. As such, it is a welcome contribution to ongoing discussions of (self) citations. However, without a clear causal framework, it is challenging to interpret the observed differences.

      We agree that causal studies provide many benefits. Yet, association studies also provide many benefits. For example, an association study allowed us to analyze a wider range of articles than a causal study would have.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Statistical suggestions:

      (1) To improve statistical inference, nesting should be accounted for in all of the analyses. For example, the logistic regression model using citing/cited pairs should include random effects for article, author, and perhaps subfield, in order for independence of observations to be plausible. Similarly, bootstrapping and permutation would ideally occur at the author level rather than (or in addition to) the paper level.

      Detailed updates addressing these points are in the public review. In short, we found computational challenges with many levels of the random effects (>100,000) and millions of observations at the citation pairs level. As such, we decided to model citations rates and counts by paper. In this case, we found that results could be unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. We repeated the random resampling 100 times (Figure S12). We updated our description of our models in the Methods section (page 21, line 754).

      For permutation tests and bootstrapping, we now define an “exchangeability block” as a co-authorship group of authors. In this dataset, that meant any authors who published together (among the articles in this dataset) as a First Author / Last Author pairing were assigned to the same exchangeability block. It is not realistic to check for overlapping middle authors in all papers because of the collaborative nature of the field. In addition, we believe that self-citations are primarily controlled by first and last authors, so we can assume that middle authors do not control self-citation habits. We then performed bootstrapping and permutation tests in the constraints of the exchangeability blocks.

      (2) In general, I am having trouble understanding the structure of the regression models. My current belief is that rows are composed of individual citations from papers' reference lists, with the outcome representing their status as a self-citation or not, and with various citing article and citing author characteristics as predictors. However, the fact that author type is included in the model as a predictor (rather than having a model for FA self-citations and another for LA self-citations) suggests to me that each citation is entered as two separate rows - once noting whether it was a FA self-citation and once noting whether it was an LA self-citation - and then it is run as a single model.

      (2a) If I am correct, the model is unlikely to be producing valid inference. I would recommend breaking this analysis up into two separate models, and including article-, author-, and subfield-level random effects. You could theoretically include a citation-level random effect and keep it as one model, but each 'group' would only have two observations and the model would be fairly unstable as a result.

      (2b) If I am misunderstanding (and even if not), I would encourage you to provide a more detailed description of the dataset structure and the model - perhaps with a table or diagram

      We split the data into two models and decided to model on the level of a paper (self-citation rate and self-citation count). In addition, we subsampled the dataset such that each author only appears once to avoid misestimation of confidence intervals (see point (1) above). As described in the public review, we included much more detail in our methods section now to improve the clarity of our models.

      (3) I would suggest removing the inverse hyperbolic sine transform and replacing it with a more flexible approach to estimating the relationships' shape, like generalized additive models or other spline-based methods to ensure that the chosen method is appropriate - or at the very least checking that it is producing a realistic fit that reflects the underlying shape of the relationships.

      More details are available in the public review, but we now use GAMs throughout the manuscript.

      (4) For the "highly self-citing" analysis, it is unclear why papers in the 15-25% range were dropped rather than including them as their own category in an ordinal model. I might suggest doing the latter, or explaining the decision more fully

      We previously included this analysis as a paper-level model because our main model was at the level of citation pairs. Now, we removed this analysis because we model self-citation rates and counts by paper.

      (5) It would be beneficial for the reader to know what % of the data was dropped for each analysis, and for your team to make sure that there is not differential missing data that could affect the interpretation of the results (e.g., differences in self-citation being due to differences in Scopus ID coverage).

      Thank you for this suggestion. We added more detailed missingness data to 4.3 Data exclusions and missingness. We did find differential missingness and added it to the limitations section. However, certain aspects of this cannot be corrected because the data are just not available (e.g., Scopus coverage issues). Further details are available in the public review.

      Conceptual thoughts:

      (1) I agree with your decision to focus on the second definition of self-citation (self-cites relative to my citations to others' work) rather than the first (self-cites relative to others' citations to my work). But it does seem that the first definition is relevant in the context of gaming citation metrics. For example, someone who writes one paper per year with a reference list of 30% self-citations will have much less of an impact on their H-index than someone who writes 10 papers per year with 10% self-citations. It could be interesting to see how these definitions interact, and whether people who are high on one measure tend to be high on the other.

      We agree this would be interesting to investigate in the future. Unfortunately, our dataset is organized at the level of the paper and thus does not contain information regarding how many times the authors cite a particular work. We hope that we can explore this interaction in the future.

      (2) This is entirely speculative, but I wonder whether the increasing rate of LA self-citation relative to FA self-citation is partly due to PIs over-citing their own lab to build up their trainees' citation records and help them succeed in an increasingly competitive job market. This sounds more innocuous than doing it to benefit their own reputation, but it would provide another mechanism through which students from large and well-funded labs get a leg-up in the job market. Might be interesting to explore, though I'm not exactly sure how :)

      This is a very interesting point. We do not have any means to investigate this with the current dataset, but we added it to the discussion (page 14, line 421):

      “A third, more optimistic explanation is that principal investigators (typically Last Authors) are increasingly self-citing their lab’s papers to build up their trainee’s citation records for an increasingly competitive job market.”

      Reviewer #2 (Recommendations For The Authors):

      (1) In regards to point 1 in the public review: In the spirit of transparency, the authors would benefit from providing a rationale for their choice of top journals, and the methodology used to identify them. It would also be valuable to include the impact factor of each journal in the S1 table alongside their names.

      Given the availability and executability of code, it would be useful to see how and if the self-citation trends vary amongst the "low impact" journals (as measured by the IF). This could go in any of the three directions:

      a. If it is found that self-citations are not as prevalent in low impact journals, this could be a great starting point for a conversation around the evaluation of journals based on impact factor, and the role of self-citations in it.

      b. If it is found that self-citations are as prevalent in low impact journals as high impact journals, that just strengthens your results further.

      c. If it is found that self-citations are more prevalent in low impact journals, this would mean your current statistics are a lower bound to the actual problem. This is also intuitive in the sense that high impact journals get more external citations (and more exposure) than low impact journals, as such authors (and journals) may be less likely to self-cite.

      Expanding the dataset to include many more journals was not feasible. Instead, we included an impact factor term in our models, as detailed in the public review. We found no strong trends in the association between impact factor and self-citation rate/count. Another important note is that these journals were considered “high impact” in 2020, but many had lower impact factors in earlier years. Thus, our modeling allows us to estimate how impact factor is related to self-citations across a wide range of impact factors.

      It is crucial to consider utilizing such a comprehensive database as Scopus, which provides a more thorough list of all journals in Neuroscience, to obtain a more representative sample. Alternatively, other datasets like Microsoft Academic Graph, and OpenAlex offer information on the field of science associated with each paper, enabling a more comprehensive analysis.

      We agree that certain datasets may offer a wider view of the entire field. However, we included a large number of papers and journals relative to previous studies. In addition, Scopus provides a lot of detailed and valuable author-level information. We had to limit our calls to the Scopus API so restricted journals by 2020 impact factor.

      (2) In regards to point 2 in the public review: To enhance the accuracy and specificity of the analysis, it would be beneficial to distinguish neuroscientists among the co-authors. This could be accomplished by examining their publication history leading up to the time of publication of the paper, and identify each author's level of engagement and specialization within the field of neuroscience.

      Since the field of neuroscience is largely based on collaborations, we find that it might be impossible to determine who is a neuroscientist. For example, a researcher with a publication history in physics may now be focusing on computational neuroscience research. As such, we feel that our current work, which ensures that the papers belong to neuroscience, is representative of what one may expect in terms of neuroscience research and collaboration.

      (3) In regards to point 3 in the public review: I highly recommend plotting self-citation rate as the number of papers in the reference list over the number of total publications to date of paper publication.

      As described in the public review, we have now done this (Figure S3).

      (4) In regards to point 5 in the public review: It would be useful to consider the "quality" of citations to further the discussion on self-citations. For instance, differentiating between self-citations that are perfunctory and superficial from those that are essential for showing developmental work, would be a valuable contribution.

      Other databases may have access to this information, but ours unfortunately does not. We agree that this is an interesting area of work.

      (5) The authors are to be commended for their logistic regression models, as they control for many confounders that were lacking in their earlier descriptive statistics. However, it would be beneficial to rerun the same analysis but on a linear model whereby the outcome variable would be the number of self-citations per author. This would possibly resolve many of the comments mentioned above.

      Thank you for your suggestion. As detailed in the public review, we now model the number of self-citations. This is modeled on the paper level, not the author level, because our dataset was downloaded by paper, not by author.

      Minor suggestions:

      (1) Abstract says one of your findings is: "increasing self-citation rates of First Authors relative to Last Authors". Your results actually show the opposite (see Figure 1b).

      Thank you for catching this error. We corrected it to match the results and discussion in the paper:

      “…increasing self-citation rates of Last Authors relative to First Authors.”

      (2) It might be interesting to compute an average academic age for each paper, and look at self-citation vs average academic age plot.

      We agree that this would be an interesting analysis. However, to limit calls to the API, we collected academic age data only on First and Last Authors.

      (3) It may be interesting to look at the distribution of women in different subfields within neuroscience, and the interaction of those in the context of self-citations.

      Thank you for this interesting suggestion. We added the following analysis (page 9, line 305):

      “Furthermore, we explored topic-by-gender interactions (Figure S10). In short, men and women were relatively equally represented as First Authors, but more men were Last Authors across all topics. Self-citation rates were higher for men across all topics.”

      Reviewer #3 (Recommendations For The Authors):

      - In the abstract, "flaws in citation practices" seems worded rather strongly.

      We respectfully disagree, as previous works have shown significant bias in citation practices. For example, Dworkin et al. (Dworkin et al. 2020) found that neuroscience reference lists tended to under-cite women, even after including various covariates.

      - Links of the references to point to (non-accessible) paperpile references, you would probably want to update this.

      We apologize for the inconvenience and have now removed these links.

      - p 2, l 24: The explanation of ref. (5) seems to be a bit strangely formulated. The point of that article is that citations to work that reinforce a particular belief are more likely to be cited, which *creates* unfounded authority. The unfounded authority itself is hence no part of the citation practices

      Thank you for catching our misinterpretation. We have now removed this part of the sentence.

      - p 3, l 16: "h indices" or "citations" instead of "h-index".

      We now say “h-indices”.

      - p 5, l 5: how was the manual scoring done?

      We added the following to the caption of Figure S1.

      “Figure S1. Comparison between manual scoring of self-citation rates and self-citation rates estimated from Python scripts in 5 Psychiatry journals: American Journal of Psychiatry, Biological Psychiatry, JAMA Psychiatry, Lancet Psychiatry, and Molecular Psychiatry. 906 articles in total were manually evaluated (10 articles per journal per year from 2000-2020, four articles excluded for very large author list lengths and thus high difficulty of manual scoring). For manual scoring, we downloaded information about all references for a given article and searched for matching author names.”

      - p 5, l 23: Why this specific p-value upper bound of 4e-3? From later in the article, I understand that this stems from the 10000 bootstrap sample, with then taking a Bonferroni correction? Perhaps good to clarify this briefly somewhere.

      Thank you for this suggestion. We now perform Benjamini/Hochberg false discovery rate (FDR) correction, but we added a description of the minimum P value from permutations (page 21, line 748):

      “All P values described in the main text were corrected with the Benjamini/Hochberg 16 false discovery rate (FDR) correction. With 10,000 permutations, the lowest P value after applying FDR correction is P=2.9e-4, which indicates that the true point would be the most extreme in the simulated null distribution.”

      - Fig. 1, caption: The (a) and (b) labelling here is a bit confusing, because the first sentence suggests both figures portray the same, but do so for different time periods. Perhaps rewrite, so that (a) and (b) are both described in a single sentence, instead of having two different references to (a) and (b).

      Thank you for pointing this out. We fixed the labeling of this caption:

      “Figure 1. Visualizing recent self-citation rates and temporal trends. a) Kernel density estimate of the distribution of First Author, Last Author, and Any Author self-citation rates in the last five years. b) Average self-citation rates over every year since 2000, with 95% confidence intervals calculated by bootstrap resampling.”

      - p7, l 9: Regarding "academic age", note that there might be a difference between "age" effects and "cohort" effects. That is, there might be difference between people with a certain career age who started in 1990 and people with the same career age, but who started in 2000, which would be a "cohort" effect.

      We agree that this is a possible effect and have added it to the limitations (page 16, line 532):

      “Tenth, while we considered academic age, we did not consider cohort effects. Cohort effects would depend on the year in which the individual started their career.”

      - p 7, l 15: "jumps" suggests some sort of sudden or discontinuous transition, I would just say "increases".

      We now say “increases.”

      - Fig. 2: Perhaps it should be made more explicit that this includes only academics with at least 50 papers. Could the authors please clarify whether the same limitation of at least 50 papers also features in other parts of the analysis where academic age is used? This selection could affect the outcomes of the analysis, so its consequences should be carefully considered. One possibility for instance is that it selects people with a short career length who have been exceptionally productive, namely those that have had 50 papers, but only started publishing in 2015 or so. Such exceptionally productive people will feature more highly in the early career part, because they need to be so productive in order to make the cut. For people with a longer career, the 50 papers would be less of a hurdle, and so would select more and less productive people more equally.

      We apologize for the lack of clarity. We did not use this requirement where academic age was used. We mainly applied this requirement when aggregating by country, as we did not want to calculate self-citation rate in a country based on only several papers. We have clarified various data exclusions in our new section 4.3 Data exclusions and missingness.

      - p 8, l 11: The affiliated institution of an author is not static, but rather changes throughout time. Did the authors consider this? If not, please clarify that this refers to only the most recent affiliation (presumably). Authors also often have multiple affiliations. How did the authors deal with this?

      The institution information is at the time of publication for each paper. We added more detail to our description of this on page 19, line 656:

      “For both First and Last Authors, we found the country of their institutional affiliation listed on the publication. In the case of multiple affiliations, the first one listed in Scopus was used.”

      - p 10, l 6: How were these self-citation rates calculated? This is averaged per author (i.e. only considering papers assigned to a particular topic) and then averaged across authors? (Note that in this way, the average of an author with many papers will weigh equally with the average of an author with few papers, which might skew some of the results).

      We calculate it across the entire topic (i.e., do NOT calculate by author first). We updated the description as follows (page 7, line 211):

      “We then computed self-citation rates for each of these topics (Figure 4) as the total number of self-citations in each topic divided by the total number of references in each topic…”

      - p 13, l 18: Is the academic age analysis here again limited to authors having at least 50 papers?

      This is not limited to at least 50 papers. To clarify, the previous analysis was not limited to authors with 50 papers. It was instead limited to ages in our dataset that had at least 50 data points. e.g., If an academic age of 70 only had 20 data points in our dataset, it would have been excluded.

      - Fig. 5: Here, comparing Fig. 5(d) and 5(f) suggests that partly, the self-citation rate differences between men and women, might be the result of the differences in number of papers. That is, the somewhat higher self-citation rate at a given academic age, might be the result of the higher number of papers at that academic age. It seems that this is not directly described in this part of the analysis (although this seems to be the case from the later regression analysis).

      We agree with this idea and have added a new section as follows (page 13, line 384):

      “2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates by highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      - Section 2.10. Perhaps the authors could clarify that this analysis takes individual articles as the unit of analysis, not citations.

      We updated all our models to take individual articles and have clarified this with more detailed tables.

      - p 18, l 10: "Articles with between 15-25% self-citation rates were 10 discarded" Why?

      We agree that these should not be discarded. However, we previously included this analysis as a paper-level model because our main model was at the level of citation pairs. Now, we removed this analysis because we model self-citation rates and counts by paper.

      - p 20, l 5: "Thus, early-career researchers may be less incentivized to 5 self-promote (e.g., self-cite) for academic gains compared to 20 years ago." How about the possibility that there was less collaboration, so that first authors would be more likely to cite their own paper, whereas with more collaboration, they will more often not feature as first author?

      This is an interesting point. We feel that more collaboration would generally lead to even more self-citations, if anything. If an author collaborates more, they are more likely to be on some of the references as a middle author (which by our definition counts toward self-citation rates).

      - p 20, l 15: Here the authors call authors to avoid excessive self-citations. Of course, there's nothing wrong with calling for that, but earlier the authors were more careful to not label something directly as excessive self-citations. Here, by stating it like this, the authors suggest that they have looked at excessive self-citations.

      We rephrased this as follows:

      Before: “For example, an author with 30 years of experience cites themselves approximately twice as much as one with 10 years of experience on average. Both authors have plenty of works that they can cite, and likely only a few are necessary. As such, we encourage authors to be cognizant of their citations and to avoid excessive self-citations.”

      After: “For example, an author with 30 years of experience cites themselves approximately twice as much as one with 10 years of experience on average. Both authors have plenty of works that they can cite, and likely only a few are necessary. As such, we encourage authors to be cognizant of their citations and to avoid unnecessary self-citations.”

      - p 22, l 11: Here again, the same critique as p 20, l15 applies.

      We switched “excessively” to “unnecessarily.”

      - p 23, l 12: The authors here critique ref. (21) of ascertainment bias, namely that they are "including only highly-achieving researchers in the life 12 sciences". But do the authors not do exactly the same thing? That is, they also only focus on the top high-impact journals.

      We included 63 high-impact journals with tens of thousands of authors. In addition, some of these journals were not high-impact at the time of publication. For example, Acta Neuropathologica had an impact factor of 17.09 in 2020 but 2.45 in 2000. This still is a limitation of our work, but we do cover a much broader range of works than the listed reference (though their analysis also has many benefits since it included more detailed information).

      - p 26, l 22-26: It seems that the matching is done quite broadly (matching last names + initials at worst) for self-citations, while later (in section 4.9, p 31, l 9), the authors switch to only matching exact Scopus Author IDs. Why not use the same approach throughout? Or compare the two definitions (narrow / broad).

      Thank you for catching this mistake. We now use the approach of matching Scopus Author IDs throughout.

      - S8: it might be nice to explore open alternatives, such as OpenAlex or OpenAIRE, instead of the closed Scopus database, which requires paid access (which not all institutions have, perhaps that could also be corrected in the description in GitHub).

      Thank you for this suggestion. Unfortunately, switching databases would require starting our analysis from the beginning. On our GitHub page, we state: “Please email matthew.rosenblatt@yale.edu if you have trouble running this or do not have institutional access. We can help you run the code and/or run it for you and share your self-citation trends.” We feel that this will allow us to help researchers who may not have institutional access. In addition, we released our aggregated, de-identified (title and paper information removed) data on GitHub for other researchers to use.

    5. eLife assessment

      This study examines how self-citations in selected neurology, neuroscience, and psychiatry journals differ according to geography, gender, seniority, and subfield. The evidence supporting the claims is mostly convincing, but certain aspects of the analysis would benefit from further work. Overall, the article is a valuable addition to the literature on self-citations

    1. eLife assessment

      This manuscript is an important contribution toward understanding the mechanisms of transcriptional bursting. The evidence is considered solid. Questions regarding the broader advance, details of the analysis, and the models used in the analysis were addressed by the authors.

    2. Reviewer #1 (Public Review):

      In this manuscript, the authors investigate whether enhancers use a common regulatory paradigm to modulate transcriptional bursting in both endogenous and ectopic domains using cis-regulatory mutant reporters of the eve transcriptional locus in early Drosophila embryogenesis.

      The authors create a series of cis-regulatory BAC mutants of the eve stripe 1 and 2 enhancers by mutating the binding sites for the transcriptional repressor Giant in the stripe 2 minimal response element (MRE) independently or in combination with deletion of the stripe 1 enhancer sequence. With these enhancer mutations, they are able to generate conditions in which eve is ectopically expressed. Next, the authors investigate if nuclei in these "ectopic" regions have similar transcriptional kinetics to the "endogenous"-expressing eve+ nuclei. They show that bursting parameters are unchanged when comparing endogenous and ectopic gene expression regions. Under a scheme of a 2-state model, the eveS1Δ-EveS2Gt- reporter modulates transcription by increasing the active state switching rate (kon) and the initiation rate (r) while maintaining a constant inactive state switching rate.

      Based on these results, the authors support a model whereby kinetic regimes are encoded in the cis-regulatory sequences of a gene instead of imposed by an evolving trans-regulatory environment.

      The question asked in this manuscript is important and the eve locus represents an ideal paradigm to address it in a quantitative manner. Most of the results are correctly interpreted and well-presented.

    3. Reviewer #2 (Public Review):

      The manuscript by Berrocal et al. asks if shared bursting kinetics, as observed for various developmental genes in animals, hint towards a shared molecular mechanism or result from natural selection favoring such a strategy. Transcription happens in bursts. While transcriptional output can be modulated by altering various properties of bursting, certain strategies are observed more widely. As the authors noted, recent experimental studies have found that even-skipped enhancers control transcriptional output by changing burst frequency and amplitude while burst duration remains largely constant. The authors compared the kinetics of transcriptional bursting between endogenous and ectopic gene expression patterns. It is argued that since enhancers act under different regulatory inputs in ectopically expressed genes, adaptation would lead to diverse bursting strategies as compared to endogenous gene expression patterns. To achieve this goal, the authors generated ectopic even-skipped transcription patterns in fruit fly embryos. The key finding is that bursting strategies are similar in endogenous and ectopic even-skipped expression. According to the authors, the findings favor the presence of a unified molecular mechanism shaping even-skipped bursting strategies. This is an important piece of work. Everything has been carried out in a systematic fashion.

    4. Reviewer #3 (Public Review):

      In this manuscript by Berrocal and coworkers, the authors do a deep dive into the transcriptional regulation of the eve gene in both an endogenous and ectopic background. The idea is that by looking at eve expression under non-native conditions, one might infer how enhancers control transcriptional bursting. The main conclusion is that eve enhancers have not evolved to have specific behaviors in the eve stripes, but rather the same rates in the telegraph model are utilized as control rates even under ectopic or 'de novo' conditions. For example, they achieve ectopic expression (outside of the canonical eve stripes) through a BAC construct where the binding sites for the TF Giant are disrupted along with one of the eve enhancers. Perhaps the most general conclusion is that burst duration is largely constant throughout at ~ 1 - 2 min. This conclusion is consistent with work in human cell lines that enhancers mostly control frequency and that burst duration is largely conserved across genes, pointing to an underlying mechanistic basis that has yet to be determined.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      [...] Based on these results, the authors support a model whereby kinetic regimes are encoded in the cis-regulatory sequences of a gene instead of imposed by an evolving trans-regulatory environment.

      The question asked in this manuscript is important and the eve locus represents an ideal paradigm to address it in a quantitative manner. Most of the results are correctly interpreted and well-presented. However, the main conclusion pointing towards a potential "unified theory" of burst regulation during Drosophila embryogenesis should be nuanced or cross-validated.

      Our results and those of others suggest that different developmental genes follow unified—yet different—transcriptional control strategies whereby different combinations of bursting parameters are regulated to modulate gene expression: burst frequency and amplitude for eve (Berrocal et al., 2020), and burst frequency and duration for gap genes (Zoller et al., 2018). In light of the aforementioned works, we can only claim that our results suggest a unified strategy for eve, our case of study, as we observe that eve regulatory strategies are robust to disruption of enhancers and binding sites. In the Discussion section of our revised manuscript, we will emphasize that the bursting control strategy we uncovered for eve does not necessarily apply to other genes, and speculate in more detail that genes that employ the same strategy of transcriptional bursting may be grouped in families that share a common molecular mechanism of transcription.

      Manuscript updates:

      We have emphasized in the Discussion section that our claim of unified strategies pertains exclusively to the bursting behavior of the gene even-skipped, and do not necessarily extend to other genes. To clarify this point, we referenced the findings of (Zoller, Little, and Gregor 2018) and (Chen et al. 2023), who observed that the bursting control strategy of Drosophila gap genes relies on the modulation of burst frequency and duration. Additionally, we cited the findings of (Syed, Duan, and Lim 2023), who reported a decrease in bursting amplitude and duration upon disruption of Dorsal binding sites on the snail minimal distal enhancer. Both examples describe bursting control strategies that differ from the modulation of burst frequency and amplitude observed for even-skipped.

      In addition to the lack of novelty (some results concerning the fact that koff does not change along the A/P axis/the idea of a 'unified regime' were already obtained in Berrocal et al 2020),...

      Unfortunately, we believe there is a misunderstanding in terms of what we construe as novelty in our work. In our previous work (Berrocal et al., 2020), we observed that the seven stripes of even-skipped (eve) expression modulate transcriptional bursting through the same strategy—bursting frequency and amplitude are controlled to yield various levels of mRNA synthesis, while burst duration remains constant. We reproduce that result in our paper, and do not claim any novelty. However, what was unclear is whether the observed eve bursting control strategy would only exist in the wild-type stripes, whose expression—we reasoned—is under strong selection due to the dramatic phenotypic consequences of eve transcription, or if eve transcriptional bursting would follow the same strategy under trans-regulatory environments that are not under selection to deliver specific spatiotemporal dynamics of eve expression. Our results—and here lies the novelty of our work—support the second scenario, and point to a model where eve bursting strategies do not result from adaptation of eve activity to specific trans-regulatory environments. Instead, we speculate that a molecular mechanism constrains eve bursting strategy whenever and wherever the gene is active. This is something that we could not have known from our first study in (Berrocal et al., 2020) and constitutes the main novelty of our paper. To put this in other words, the novelty of our work does not rest on the fact that both burst frequency and amplitude are modulated in the endogenous eve pattern, but that this modulation remains quantitatively indistinguishable when we focus on ectopic areas of expression. We will make this point clearer in the Introduction and Discussion section of our revised manuscript.

      Manuscript updates:

      We have clarified this point in both the Introduction and Discussion sections. In the updated Introduction, we state that while our previous work (Berrocal et al. 2020) examined bursting strategies in endogenous expression regions that are, in principle, subject to selection, the present study induced the formation of ectopic expression patterns to probe bursting strategies in regions presumably devoid of evolutionary pressures. In the Discussion section, we highlight that the novelty of our work lies in the insights derived from the comparative analysis between ectopic and endogenous regions of even-skipped expression, an aspect not addressed in our previous work.

      … note i) the limited manipulation of TF environment;...

      We acknowledge that additional genetic manipulations would make it possible to further test the model. However, we hope that the reviewer will agree with us that the manipulations that we did perform are sufficient to provide evidence for common bursting strategies under the diverse trans-regulatory environments present in wild-type and ectopic regions of gene expression. In the Discussion section of our revised manuscript, we will elaborate further on the kind of genetic manipulations (e.g., probing transcriptional strategies that result from swapping promoters in the context of eve-MS2 BAC; or quantifying the impact on eve transcriptional control after performing optogenetic perturbations of transcription factors and/or chromatin remodelers) that could shed further light on the currently undefined molecular mechanism that constrains eve bursting strategies, as a mean to motivate future work.

      Manuscript updates:

      In our Discussion section, we elaborated on proposed manipulations of the transcription factor environment to elucidate the molecular mechanisms behind even-skipped bursting control strategies. We began by listing studies linking transcription factor concentration to bursting control strategies, such as (Hoppe et al. 2020), who observed that the natural BMP (Bone Morphogenetic Protein) gradient shapes bursting frequency of target genes in Drosophila embryos. And (Zhao et al. 2023), who used the LEXY optogenetic system to modulate Knirps nuclear concentration and observed that this repressor acts on eve stripe 4+6 enhancer by gradually decreasing bursting frequency until the locus adopts a reversible quiescent state. Then, we proposed performing systematic LEXY-mediated modulation of critical transcription factors (Bicoid, Hunchback, Giant, Kruppel, Zelda) to understand the extent of their contribution to the unified even-skipped bursting strategies.

      To better frame the hypothesis that the even-skipped promoter defines strategies of bursting control, we added a reference to the work of (Tunnacliffe, Corrigan, and Chubb 2018). This study surveyed 17 actin genes with identical sequences but distinct promoters in the amoeba Dictyostelium discoideum, and found that all genes display different bursting strategies. Their findings, together with the previously cited work by (Pimmett et al. 2021) and (Yokoshi et al. 2022), suggest a critical role of gene promoters in constraining the bursting strategies of eukaryotic genes.

      … ii) the simplicity with which bursting is analyzed (only a two-state model is considered, and not cross-validated with an alternative approach than cpHMM) and…

      Based on our previous work (Lammers et al., 2020), and as described in the SI Section of the current manuscript: Inference of Bursting Parameters, we selected a three-state model (OFF, ON1, ON2) under the following rationale: transcription of even-skipped in pre-gastrulating embryos occurs after DNA replication, and promoters on both sister chromatids remain paired. Most of the time these paired loci cannot be resolved independently using conventional microscopy. As a result, when we image an MS2 spot, we are actually measuring the transcriptional dynamics of two promoters. Thus, each MS2-fluorescent spot may result from none (OFF), one (ON1) or two (ON2) sister promoters being in the active state. Following our previous work, we analyzed our data assuming the three-state model (OFF, ON1, ON2), and then, for ease of presentation, aggregated ON1 and ON2 into an effective single ON state. As for the lack of an alternative model, we chose the simplest model compatible with our data and our current understanding of transcription at the eve locus. With this in mind, we do not rule out the possibility that more complex processes—that are not captured by our model—shape MS2 fluorescence signals. For example, promoters may display more than two states of activity. However, as shown in (Lammers et al., 2020 - SI Section: G. cpHMM inference sensitivities), model selection schemes and cross-validation do not give consistent results on which model is more favorable; and for the time being, there is not a readily available alternative to HMM for inference of promoter states from MS2 signal. For example, orthogonal approaches to quantify transcriptional bursting, such as smFISH, are largely blind to temporal dynamics. As a result, we choose to entertain the simplest two-state model for each sister promoter. We appreciate these observations, as they point out the need of devoting a section in the supplemental material of our revised manuscript to clarify the motivations behind model selection.

      Manuscript updates:

      We have devoted the new Supplemental Material section “Selection of a three-state model of promoter activity and a compound Hidden Markov Model for inference of promoter states from MS2 fluorescent signal” to clarify the rationale behind our selection of a three-state promoter activity model. Since transcription in pre-gastrulating Drosophila embryos occurs after DNA replication, each MS2-active locus contains two unresolvable sister promoters that can either be inactive (OFF), one active (ON1), or both active (ON2).

      Next, we elaborated on the conversion of a three-state model into an effective two-state model for ease of presentation and described how the effective two-state model parameters—kon (burst frequency), koff-1 (burst duration), and r (burst amplitude)—were calculated.

      Additionally, we acknowledged that while the three-state model of promoter activity is the simplest model compatible with our current understanding of transcription in the even-skipped locus, we do not rule out the possibility that even-skipped transcription may be described by more complex models that include multiple states beyond ON and OFF. Finally, we referenced (Lammers et al. 2020) who asserted that while all inferences of promoter states computed from confocal microscopy of MS2/PP7 fluorescence data rely on Hidden Markov models, cross-comparisons between one, two, or multiple-state Hidden Markov models do not yield consistent results regarding which is more accurate. We close the new section by proposing that state-of-the-art microscopy and deconvolution algorithms to improve signal-to-noise-ratio may offer alternatives to the inference of promoter states.

      … iii) the lack of comparisons with published work.

      We thank the reviewer for pointing this out. In the current discussion of our manuscript, we compare our findings to recent articles that have addressed the question of the origin of bursting control strategies in Drosophila embryos (Pimmett et al., 2021; Yokoshi et al., 2022; Zoller et al., 2018). Nevertheless, we acknowledge that we failed to include references that are relevant to our study. Thus, our revised Discussion section must include recent results by (Syed et al., 2023), which showed that the disruption of Dorsal binding sites on the snail minimal distal enhancer results in decreased amplitude and duration of transcription bursts in fruit fly embryos. Additionally, we have to incorporate the study by (Hoppe et al., 2020), which reported that the Drosophila bone morphogenetic protein (BMP) gradient modulates the bursting frequency of BMP target genes. References to thorough studies of bursting control in other organisms, like Dictyostelium discoideum (Tunnacliffe et al., 2018), are due as well.

      Manuscript updates:

      As mentioned in the updates above, our revised manuscript now includes long due references to studies by (Syed, Duan, and Lim 2023), (Hoppe et al. 2020), (Tunnacliffe, Corrigan, and Chubb 2018), and (Chen et al. 2023). All of which are relevant for our current workk.

      Reviewer #2 (Public Review):

      The manuscript by Berrocal et al. asks if shared bursting kinetics, as observed for various developmental genes in animals, hint towards a shared molecular mechanism or result from natural selection favoring such a strategy. Transcription happens in bursts. While transcriptional output can be modulated by altering various properties of bursting, certain strategies are observed more widely. As the authors noted, recent experimental studies have found that even-skipped enhancers control transcriptional output by changing burst frequency and amplitude while burst duration remains largely constant. The authors compared the kinetics of transcriptional bursting between endogenous and ectopic gene expression patterns. It is argued that since enhancers act under different regulatory inputs in ectopically expressed genes, adaptation would lead to diverse bursting strategies as compared to endogenous gene expression patterns. To achieve this goal, the authors generated ectopic even-skipped transcription patterns in fruit fly embryos. The key finding is that bursting strategies are similar in endogenous and ectopic even-skipped expression. According to the authors, the findings favor the presence of a unified molecular mechanism shaping even-skipped bursting strategies. This is an important piece of work. Everything has been carried out in a systematic fashion. However, the key argument of the paper is not entirely convincing.

      We thank the reviewer, as these comments will enable us to improve the Discussion section and overall logic of our revised manuscript. We agree that the evidence provided in this work, while systematic and carefully analyzed, cannot conclusively rule out either of the two proposed models, but just provide evidence supporting the hypothesis for a specific molecular mechanism constraining eve bursting strategies. Our experimental evidence points to valuable insights about the mechanism of eve bursting control. For instance, had we observed quantitative differences in bursting strategies between ectopic and endogenous eve domains, we would have rejected the hypothesis that a common molecular mechanism constrains eve transcriptional bursting to the observed bursting control strategy of frequency and amplitude modulation. Thus, we consider that our proposition of a common molecular mechanism underlying unified eve bursting strategies despite changing trans-regulatory environments is more solid. On the other hand, while our model suggests that this undefined bursting control strategy is not subject to selection acting on specific trans-regulatory environments, it is not trivial to completely discard selection for specific bursting control strategies given our current lack of understanding of the molecular mechanisms that shape the aforesaid strategies. Indeed, we cannot rule out the hypothesis that the observed strategies are most optimal for the expression of eve endogenous stripes according to natural selection, and that these control strategies persist in ectopic regions as an evolutionary neutral “passenger phenotype” that does not impact fitness. We recognize the need to acknowledge this last hypothesis in the updated Introduction and Discussion sections of our manuscript. Further studies will be needed to determine the mechanistic and molecular basis of eve bursting strategies.

      Manuscript updates:

      In this work, we compared strategies of bursting control between endogenous and ectopic regions of even-skipped expression. Different strategies between both regions would suggest that selective pressure maintains defined bursting strategies in endogenous regions. Conversely, similar strategies in both ectopic and endogenous regions would imply that a shared molecular mechanism constrains bursting parameters despite changing trans-regulatory environments.

      In our updated Discussion section, we acknowledge that while our work provides evidence supporting the second hypothesis, we cannot conclusively rule out the possibility that the observed strategies were selected as the most optimal for endogenous even-skipped expression regions and that ectopic regions retain such optimal bursting strategies as an evolutionary neutral “passenger phenotype”.

      Reviewer #3 (Public Review):

      In this manuscript by Berrocal and coworkers, the authors do a deep dive into the transcriptional regulation of the eve gene in both an endogenous and ectopic background. The idea is that by looking at eve expression under non-native conditions, one might infer how enhancers control transcriptional bursting. The main conclusion is that eve enhancers have not evolved to have specific behaviors in the eve stripes, but rather the same rates in the telegraph model are utilized as control rates even under ectopic or 'de novo' conditions. For example, they achieve ectopic expression (outside of the canonical eve stripes) through a BAC construct where the binding sites for the TF Giant are disrupted along with one of the eve enhancers. Perhaps the most general conclusion is that burst duration is largely constant throughout at ~ 1 - 2 min. This conclusion is consistent with work in human cell lines that enhancers mostly control frequency and that burst duration is largely conserved across genes, pointing to an underlying mechanistic basis that has yet to be determined.

      We thank the reviewer for the assessment of our work. Indeed, evidence from different groups (Berrocal et al., 2020; Fukaya et al., 2016; Hoppe et al., 2020; Pimmett et al., 2021; Senecal et al., 2014; Syed et al., 2023; Tunnacliffe et al., 2018; Yokoshi et al., 2022; Zoller et al., 2018) is coming together to uncover commonalities, discrepancies, and rules that constrain transcriptional bursting in Drosophila and other organisms.

      Additional updates to the manuscript

      (1) In our current study, we observed the appearance of a mutant stripe of even-skipped expression beyond the anterior edge of eve stripe 1, which we refer to as eve stripe 0. This stripe appeared in embryos with a disrupted eve stripe 1 enhancer. In a previous study, (Small, Blair, and Levine 1992) reported a “head patch” of even-skipped expression while assaying the regulation of reporter constructs carrying the minimal regulatory element of eve stripe 2 enhancer alone. In our updated manuscript, we state that it is tempting to identify our eve stripe 0 with the previously reported head patch. (Small, Blair, and Levine 1992) speculated that this head patch of even-skipped expression appeared as a result of regulatory sequences present in the P-transposon system they used for genomic insertions. However, P-transposon sequences are not present in our experimental design. Thus, the appearance of eve stripe 0 indicates a repressive role of the eve stripe 1 enhancer at the anterior end of the embryo and may imply that the minimal regulatory element of the eve stripe 2 enhancer, as probed by (Small, Blair, and Levine 1992), can drive the expression of the head patch/eve stripe 0 when the eve stripe 1 enhancer is not present.

      (2)  In our current analysis, we observed that the disruption of Gt-binding sites on the eve stripe 2 enhancer synergizes with the deletion of the eve stripe 1 enhancer, as double mutant embryos display more ectopic expression in their anterior regions than embryos with only disrupted Gt-binding sites. While this may indicate that the repressive activity of eve stripe 1 enhancer synergizes with the repression exerted by Giant, other unidentified transcription factors may be involved in this repressive synergy. In the updated manuscript we clarified that unidentified transcription factors may bind in the vicinity of Gt-binding sites. The hypothesis that Gt-binding sites recognize other transcription factors was proposed by (Small, Blair, and Levine 1992), as they observed that the anterior expansion of eve stripe 2 resulting from Gt-binding site deletions was “somewhat more severe” than expansion observed in embryos carrying null-Giant alleles.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This reviewed preprint is a bit of Frankenstein monster, as it crams together three quite different sets of data. It is essentially three papers combined into one-one paper focused on the role of CIB2/CIB3 in VHCs, one on the role of CIB2/CIB3 in zebrafish, and one on structural modeling of a CIB2/3 and TMC1/2 complex. The authors try to combine the three parts with the overarching theme of demonstrating that CIB2/3 play a functionally conserved role across species and hair cell types, but given the previous work on these proteins, especially Liang et al. (2021) and Wang et al. (2023), this argument doesn't work very well. My sense is that the way the manuscript is written now, the sum is less than the individual parts, and the authors should consider whether the work is better split into three separate papers. 

      We appreciate the frank evaluation of our work and point out that combining structural with functional data from mouse and zebrafish offers a comprehensive view of the role played by TMC1/TMC2 and CIB2/3 complexes in hair-cell mechanotransduction. We believe that readers will benefit from this comprehensive analyses.

      The most important shortcoming is the novelty of the work presented here. In line 89 of the introduction the authors state "However, whether CIB2/3 can function and interact with TMC1/2 proteins across sensory organs, hair-cell types, and species is still unclear." They make a similar statement in the first sentence of the discussion and generally use this claim throughout the paper as motivation for why they performed the experiments. Given the data presented in the Liang et al. (2021) and Wang et al. (2023 papers), however, this statement is not well supported. Those papers clearly demonstrate a role for CIB2/CIB3 in auditory and vestibular cells in mice. Moreover, there is also data in Riazuddin et al. (2012) paper that demonstrates the importance of CIB2 in zebrafish and Drosophila. I think the authors are really stretching to describe the data in the manuscript as novel. Conceptually, it reads more as solidifying knowledge that was already sketched out in the field in past studies. 

      We note that work on mouse and fish CIB knockouts in our laboratories started over a decade ago and that our discoveries are contemporary to those recently presented by Liang et al., 2021 and Wang et al., 2023, which we acknowledge, cite, and give credit as appropriate. We also note that work on fish knockouts and on fish Cib3 is completely novel. Nevertheless, the abstract text “Whether these interactions are functionally relevant across mechanosensory organs and vertebrate species is unclear” has been replaced by “These interactions have been proposed to be functionally relevant across mechanosensory organs and vertebrate species.”; and the introduction text “However, whether CIB2/3 can function and interact with TMC1/2 proteins across sensory organs, hair-cell types, and species is still unclear” has been replaced by “However, additional evidence showing that CIB2/3 can function and interact with TMC1/2 proteins across sensory organs, hair-cell types, and species is still needed.”. The work by Wang et al., 2023 is immediately discussed after the first sentence in the discussion section and the work by Liang et al., 2021 is also cited in the same paragraph. We believe that changes in abstract and introduction along with other changes outlined below put our work in proper context.

      There is one exception, however, and that is the last part of the manuscript. Here structural studies (AlphaFold 2 modeling, NMR structure determination, and molecular dynamics simulations) bring us closer to the structure of the mammalian TMCs, alone and in complex with the CIB proteins. Moreover, the structural work supports the assignment of the TMC pore to alpha helices 4-7.

      Thanks for the positive evaluation of this work.

      Reviewer #2 (Public Review):

      The paper 'Complexes of vertebrate TMC1/2 and CIB2/3 proteins 1 form hair-cell mechanotransduction cation channels' by Giese and coworkers is quite an intense reading. The manuscript is packed with data pertaining to very different aspects of MET apparatus function, scales, and events. I have to praise the team that combined molecular genetics, biochemistry, NMR, microscopy, functional physiology, in-vivo tests for vestibulo-ocular reflexes, and other tests for vestibular dysfunction with molecular modeling and simulations. The authors nicely show the way CIBs are associated with TMCs to form functional MET channels. The authors clarify the specificity of associations and elucidate the functional effects of the absence of specific CIBs and their partial redundancy. 

      We appreciate the positive evaluation of our work and agree with the reviewer in that the combination of data obtained using various techniques in vivo and in silico provide a unique view on the role played by CIB2 and CIB3 in hair-cell mechanotransduction. 

      Reviewer #3 (Public Review):

      This study demonstrates that from fish to mammals CIB2/3 is required for hearing, revealing the high degree of conservation of CIB2/3 function in vertebrate sensory hair cells. The modeling data reveal how CIB2/3 may affect the conductance of the TMC1/2 channels that mediate mechanotransduction, which is the process of converting mechanical energy into an electrical signal in sensory receptors. This work will likely impact future studies of how mechanotransduction varies in different hair cell types. 

      One caveat is that the experiments with the mouse mutants are confirmatory in nature with regard to a previous study by Wang et al., and the authors use lower resolution tools in terms of function and morphological changes. Another is that the modeling data is not supported by electrophysiological experiments, however, as mentioned above, future experiments may address this weakness.

      We thank the reviewer for providing positive feedback and for highlighting caveats that can and will be addressed by future experiments.

      Reviewer #1 (Recommendations For The Authors): 

      Lines 100-101. Please temper this statement, as FM1-43 is only a partial proxy for MET. 

      The original text has been modified to: “In contrast to auditory hair cells, we found that the vestibular hair cells in Cib2KO/KO mice apparently have MET. We assessed MET via uptake of FM 1-43 (Figure 1A), a styryl dye that mostly permeates into hair cells through functional MET channels (Meyers et al., 2003), indicating that there may be another CIB protein playing a functionally redundant role.”

      Lines 111-113. These data do not fully match up with the Kawashima et al. (2011) data. Please discuss. 

      We have modified the text to better report the data: “Tmc2 expression increases during development but remains below Tmc1 levels in both type 1 and type 2 hair cells upon maturation (Figure 1C).”

      Lines 125-126. The comparison in 2A-B is not described correctly for the control. The strain displayed is Cib2^+/+;Cib3^KO/KO (not wild-type). Show the Cib2^+/+;Cib3^+/+ if you are going to refer to it (and is this truly Cib2^+/+;Cib3^+/+ from a cross or just the background strain?). 

      Thanks for pointing this out. To avoid confusion, we have revised the sentence as follow: “We first characterized hearing function in Cib3KO/KO and control littermate mice at P16 by measuring auditory-evoked brainstem responses (ABRs). Normal ABR waveforms and thresholds were observed in Cib3KO/KO indicating normal hearing.”  

      Lines 137-140. Did you expect anything different? This is a trivial result, given the profound loss of hearing in the Cib2^KO/KO mice. 

      We did not expect anything different and have deleted the sentence: “Furthermore, endogenous CIB3 is unable to compensate for CIB2 loss in the auditory hair cells, perhaps due to extremely low expression level of CIB3 in these cells and the lack of compensatory overexpression of CIB3 in the cochlea of Cib2KO/KO mice (Giese et al., 2017).”

      Lines 194-196. But what about Cib2^KO/KO; Isn't the conclusion that the vestibular system needs either CIB2 or CIB3? 

      Yes, either CIB2 or CIB3 can maintain normal vestibular function. A prior study by Michel et al., 2017, has evaluated and reported intact vestibular function in Cib2KO/KO mice.

      Lines 212-214. Yes. This is a stronger conclusion than the one earlier. 

      We have revised the sentence as follow: “Taken together, these results support compulsory but functionally redundant roles for CIB2 and CIB3 in the vestibular hair cell MET complex.”

      Lines 265-267. I'm not sure that I would state this conclusion here given that you then argue against it in the next paragraph. 

      We have modified this statement to make the conclusions clearer and more consistent between the two paragraphs. The modified text reads: “Thus, taken together the results of our FM 1-43 labeling analysis are consistent with a requirement for both Cib2 and Cib3 to ensure normal MET in all lateral-line hair cells.”

      Line 277. I would be more precise and say something like "and sufficiently fewer hair cells responded to mechanical stimuli and admitted Ca2+..." 

      We have modified the text as requested: “We quantified the number of hair bundles per neuromast with mechanosensitive Ca2+ responses, and found that compared to controls, significantly fewer cells were mechanosensitive in cib2 and cib2;cib3 mutants (Figure 5-figure supplement 2A, control: 92.2 ± 2.5; cib2: 49.9 ± 5.8, cib2;cib3: 19.0 ± 6.6, p > 0.0001).”

      Line 278 and elsewhere. It doesn't make sense to have three significant digits in the error. I would say either "92.2 {plus minus} 2.5" or "92 {plus minus} 2." 

      Edited as requested.

      Lines 357-358. Move the reference to the figure to the previous sentence, leaving the "(Liang et al., 2021) juxtaposed to its reference (crystal structure). Otherwise, the reader will look for crystal structures in Figure 7-figure supplements 1-5. 

      Text has been edited as requested: “The intracellular domain linking helices a2 and a3, denoted here as IL1, adopts a helix-loop-helix with the two helices running parallel to each other and differing in length (Figure 7-figure supplements 1-5). This is the same fold observed in its crystal structure in complex with CIB3 (Liang et al., 2021), which validated the modeling approach.”

      Line 450. What other ions were present besides K+? I assume Cl- or some other anion.

      What about Na+ or Ca+? It's hard to evaluate this sentence without that information. 

      Systems have 150 mM KCl and CIB-bound Ca2+ when indicated (no Na+ or free Ca2+). This is now pointed out when the models are described first: “These models were embedded in either pure POPC or stereocilia-like mixed composition bilayers and solvated (150 mM KCl) to …”. The sentence mentioned by the reviewer has also been modified: “In systems with pure POPC bilayers we observed permeation of K+ in either one or both pores of the TMC1 dimer, with or without CIB2 or CIB3 and with or without bound Ca2+, despite the presence of Cl- (150 mM KCl).”  

      Lines 470-472. These results suggest that the maximum conductance of TMC1 > TMC2. How do these results compare with the Holt and Fettiplace data? 

      Thanks for pointing this out. A comparison would be appropriate and has been added: “We also speculate that this is due to TMC2 having an intrinsic lower singlechannel conductance than TMC1, as has been suggested by some experiments (Kim et al., 2013), but not others (Pan et al., 2013). It is also possible that our TMC2 model is not in a fully open conformation, which can only be reached upon mechanical stimulation.”

      Line 563. Yes, the simulations only allow you to say that the interaction is stable for at least microseconds. However, the gel filtration experiments suggest that the interaction is stable for much longer. Please comment. 

      Thank you for pointing this out. We agree with this statement and modified the text accordingly: “Simulations of these models indicate that there is some potential preferential binding of TMC1 and TMC2 to CIB3 over CIB2 (predicted from BSA) and that TMC + CIB interactions are stable and last for microseconds, with biochemical and NMR experiments showing that these interactions are stable at even longer timescales.”  

      Figure 3. Please use consistent (and sufficiently large to be readable) font size. 

      Figure has been updated.

      Figure 4. Magnification is too low to say much about bundle structure.

      The reviewer is right – we cannot evaluate bundle structure with the images shown in Figure 4. Our goal was to determine if the vestibular hair cells had been degenerated in the absence of CIB2/3 and Figure 4 panel A data reveals intact hair cells. We changed the text “High-resolution confocal imaging did not reveal any obvious vestibular hair cell loss and hair bundles looked indistinguishable from control in Cib2KO/KO;Cib3KO/KO mice (Figure 4A).” to “High-resolution confocal imaging did not reveal any obvious vestibular hair cell loss in Cib2KO/KO;Cib3KO/KO mice (Figure 4A).” to avoid any confusions.

      Reviewer #2 (Recommendations For The Authors):

      Some datasets presented here can be published separately. Although I understand that the field is developing fast and there is no time to sort and fit the data by category or scale, everything needs to be published together and quickly.

      I have no real questions about the data on the functional association of CIB2 and 3 with TMC 1 and 2 in mouse hair cells as well as association preferences between their homologs in zebrafish. The authors have shown a clear differentiation of association preferences for CIB2 and CIB3 and the ability to substitute for each other in cochlear and vestibular hair cells. The importance of CIB2 for hearing and CIB3 for vestibular function is well documented. The absence of the startle response in cib2/3 negative zebrafish is a slight variation from what was observed in mice where CIB2 is sufficient for hearing. The data look very solid and show an overall structural and functional conservation of these complexes throughout vertebrates. The presented models look plausible, but of course, there is a chance that they will be corrected/improved in the future. 

      Thanks for appreciating the significance of our study.

      Regarding NMR, there is indeed a large number of TROSY peaks of uniformly labeled CIB2 undergoing shifts with sequential additions of the loop and the N-terminal TMC peptides. Something is going on. The authors may consider a special publication on this topic when at least partial peak assignments are established. 

      We are continuing our NMR studies of CIB and TMC interactions and plan to have follow up studies. 

      After reading the manuscript, I may suggest four topics for additional discussion. 

      (1) Maybe it is obvious for people working in the field, but for the general reader, the simulations performed with and without Ca2+ come out of the blue, with no explanation. The authors did not mention clearly that CIB proteins have at least two functional EF-hand (EF-hand-like) motifs that likely bind Ca2+ and thereby modulate the MET channel. 

      This is a good point. We have modified the introductory text to include: “CIB2 belongs to a family of four closely related proteins (CIB1-4) that have partial functional redundancy and similar structural domains, with at least two Ca2+/Mg2+-binding EF-hand motifs that are highly conserved for CIB2/3 (Huang et al., 2012).”

      If the data on affinities for Ca2+, as well as Ca2+-dependent propensity for dimerization and association with TMC exist, they should be mentioned for CIB2 and CIB3 and discussed.

      To address this, we have added the following text to the discussion: “How TMC + CIB interactions depend on Ca2+ concentration may have important functional implications for adaptation and hair cell mechanotransduction. Structures of CIB3 and worm CALM-1, a CIB2 homologue, both bind divalent ions via EF-hand motifs proximal to their C-termini (Jeong et al., 2022; Liang et al., 2021). Reports on CIB2 affinities for Ca2+ are inconsistent, with _K_D values that range from 14 µM to 0.5 mM (Blazejczyk et al., 2009; Vallone et al., 2018). Although qualitative pull-down assays done in the presence or the absence of 5 mM CaCl2 suggest that the TMC1 and CIB2 interactions are Ca2+independent (Liang et al., 2021), strength and details of the CIB-TMC-IL1 and CIB-TMCNT contacts might be Ca2+-dependent, especially considering that Ca2+ induces changes that lead to exposure of hydrophobic residues involved in binding (Blazejczyk et al., 2009).”

      Also, it is not clearly mentioned in the figure legends whether the size-exclusion experiments or TROSY NMR were performed in the presence of (saturating) Ca2+ or not. If the presence of Ca2+ is not important, it must be explained.  

      Size exclusion chromatography and NMR experiments were performed in the presence of 3 mM CaCl2. We have indicated this in appropriate figure captions as requested, and also mentioned it in the discussion text: “Interestingly, the behavior of CIB2 and CIB3 in solution (SEC experiments using 3 mM CaCl2) is different in the absence of TMC1-IL1.” and “Moreover, our NMR data (obtained using 3 mM CaCl2) indicates that TMC1-IL1 + CIB2 is unlikely to directly interact with CIB3.”

      (2) Speaking about the conservation of TMC-CIB structure and function, it would be important to compare it to the C. elegans TMC-CALM-1 structures. Is CALM-1, which binds Ca2+ near its C-terminus, homologous or similar to CIBs? 

      This is an important point. To address it, we have added the following text in the discussion: “Remarkably, the AF2 models are also consistent with the architecture of the nematode TMC-1 and CALM-1 complex (Jeong et al., 2022), despite low sequence identity (36% between human TMC1 and worm TMC-1 and 51% between human CIB2 and worm CALM-1). This suggests that the TMC + CIB functional relationship may extend beyond vertebrates.” We also added: “How TMC + CIB interactions depend on Ca2+ concentration may have important functional implications for adaptation and hair cell mechanotransduction. Structures of CIB3 and worm CALM-1, a CIB2 homologue, both bind divalent ions via EF-hand motifs proximal to their C-termini (Jeong et al., 2022; Liang et al., 2021).” 

      Additionally, superposition of CALM-1 (in blue) from the TMC-1 complex structure (PDB code: 7usx; Jeong et al., 2022) with one and our initial human CIB2 AF2 models (in red) show similar folds, notably in the EF-hand motifs of CALM-1 and CIB2 (Author response image 1).

      Author response image 1.

      Superposition of CALM-1 structure (blue; Jeong et al., 2022) and AlphaFold 2 model of CIB2 (red). Calcium ions are shown as green spheres.

      (1) Based on simulations, CIBs stabilize the cytoplasmic surfaces of the dimerized TMCs.

      The double CIB2/3 knock-out, on the other hand, clearly destabilizes the morphology of stereocilia and leads to partial degeneration. One question is whether the tip link in the double null forms normally and whether there is a vestige of MET current in the beginning. The second question is whether the stabilization of the TMC's intracellular surface has a functional meaning. I understand that not complete knock-outs, but rather partial loss-of-function mutants may help answer this question. The reader would be impatient to learn what process most critically depends on the presence of CIBs: channel assembly, activation, conduction, or adaptation. Any thoughts about it? 

      These are all interesting questions, although further investigations would be needed to understand CIB’s role on channel assembly, activation, conduction, and adaption. We have added to the discussion text: “Further studies should help provide a comprehensive view into CIB function in channel assembly, activation, and potentially hair-cell adaption.”

      (2) The authors rely on the permeation of FM dyes as a criterion for normal MET channel formation. What do they know about the permeation path a 600-800 Da hydrophobic dye may travel through? Is it the open (conductive) or non-conductive channel? Do ions and FM dyes permeate simultaneously or can this be a different mode of action for TMCs that relates them to TMEM lipid scramblases? Any insight from simulations?

      We are working on follow-up papers focused on elucidating the permeation mechanisms of aminoglycosides and small molecules (such as FM dyes) through TMCs as well as its potential scramblase activity.

      Reviewer #3 (Recommendations For The Authors):

      Introduction: 

      The rationale and context for determining whether Cib2 and Cib3 proteins are essential for mechanotransduction in zebrafish hair cells is completely lacking in the introduction. All background information about what is known about the MET complex in sensory hair cells focuses on work done with mouse cochlear hair cells without regard to other species. This is especially surprising as the third author uses zebrafish as an animal model and makes major contributions to this study, addressing the primary question posed in the introduction. Instead, the authors relegate this important information to the results section. Moreover, not mentioning the Jeong 2022 study when discussing the Liang 2021 findings is odd considering that the primary question is centered on CIB2 and TMC1/2 in other species. 

      Thank you for pointing this out. We now discuss and reference relevant background on the MET complex in zebrafish hair cells in the introduction. We added: “In zebrafish, Tmcs, Lhfpl5, Tmie, and Pcdh15 are also essential for sensory transduction, suggesting that these molecules form the core MET complex in all vertebrate hair cells (Chen et al., 2020; Erickson et al., 2019, 2017; Ernest et al., 2000; Gleason et al., 2009; Gopal et al., 2015; Maeda et al., 2017, 2014; Pacentine and Nicolson, 2019; Phillips et al., 2011; Seiler et al., 2004; Söllner et al., 2004).”. We also added: “In zebrafish, knockdown of Cib2 diminishes both the acoustic startle response and mechanosensitive responses of lateral-line hair cells (Riazuddin et al., 2012).”

      Discussion: 

      The claim that mouse vestibular hair cells in the double KO are structurally normal is not well supported by the images in Fig. 4A and is at odds with the findings by Wang et al., 2023. More discussion about the discrepancy of these results (instead of glossing over it) is warranted. The zebrafish image of the hair bundles in the zebrafish cib2/3 double knockout also appear abnormal, i.e. somewhat thinner. These results are consistent with Wang et al., 2023. Is it the case that neither images (mouse and fish) are representative? Unfortunately, the neuromast hair bundles in the double mutant are not shown, so it is difficult to draw a conclusion.

      The reviewer is right – we cannot evaluate mouse hair-cell bundle structure with the images shown in Figure 4. Our goal was to determine if the vestibular hair cells had been degenerated in the absence of CIB2/3 and Figure 4 panel A data reveals intact hair cells. We changed the text “High-resolution confocal imaging did not reveal any obvious vestibular hair cell loss and hair bundles looked indistinguishable from control in Cib2KO/KO;Cib3KO/KO mice (Figure 4A).” to “High-resolution confocal imaging did not reveal any obvious vestibular hair cell loss in Cib2KO/KO;Cib3KO/KO mice (Figure 4A).” to avoid any confusions. In addition, we have changed the discussion as follows: “We demonstrate that vestibular hair cells in mice and zebrafish lacking CIB2 and CIB3 are not degenerated but have no detectable MET, assessed via FM 1-43 dye uptake, at time points when MET function is well developed in wild-type hair cells.”

      In the discussion, the authors mention that Shi et al showed differential expression with cib2/3 in tall versus short hair cells of zebrafish cristae. However, there is no in situ data in the Shi study for cib2 and cib3. Instead, Shi et al show in situs for zpld1a and cabp5b that mark these cell types in the lateral crista. The text is slightly misleading and should be changed to reflect that UMAP data support this conclusion.

      We have removed reference to cib2/3 zebrafish differential expression from our discussion. It is true that this differential expression has only been inferred by UMAP and not in situ data.

      It should be noted that the acoustic startle reflex is mediated by the saccule in zebrafish, which does not possess layers of short and tall hair cells, but rather only has one layer of hair cells. Whether saccular hair cells can be regarded as strictly 'short' hair cell types remains to be determined. In this paragraph of the discussion, the authors are confounding their interpretation by not being careful about which endorgan they are discussing (line 521). In fact, there is a general error in the manuscript in referring to vestibular organs without specifying what is shown. The cristae in zebrafish do not participate in behavioral reflexes until 25 dpf and they are not known to synapse onto the Mauthner cell, which mediates startle reflexes.

      Thank you for pointing out these issues. We now state in the results that the startle reflex in zebrafish relies primarily on the saccule. In the discussion we now focus mainly on short and tall hair cells of the crista. We also outline again in the discussion that the saccule is required for acoustic startle and the crista are for angular acceleration.

      Minor points: 

      Lines 298-302: The Zhu reference is not correct (wrong Zhu author). The statement on the functional reliance on Tmc2a versus Tmc1/2b should be referenced with Smith et al., 2020 and the correct Zhu 2021 study from the McDermott lab. Otherwise, the basis for the roles of the Tmcs in the cartoon in panel 6E is not clear.

      Thanks for pointing out this oversight. We have updated the reference.

      Line 548 should use numbers to make the multiple points, otherwise, this sentence is long and awkward. 

      The sentence has been re-arranged to make it shorter and to address another point raised by referees: “Structural predictions using AF2 show conserved folds for human and zebrafish proteins, as well as conserved architecture for their protein complexes. Predictions are consistent with previous experimentally validated models for the TMC1 pore (Ballesteros et al., 2018; Pan et al., 2018), with the structure of human CIB3 coupled to mouse TMC1-IL1 (Liang et al., 2021), and with our NMR data validating the interaction between human TMC1 and CIB2/3 proteins. Remarkably, the AF2 models are also consistent with the architecture of the nematode TMC-1 and CALM-1 complex (Jeong et al., 2022), despite low sequence identity (36% between human TMC1 and worm TMC-1 and 51% between human CIB2 and worm CALM-1). This suggests that the TMC + CIB functional relationship may extend beyond vertebrates.”

      Suggested improvements to the figures: 

      In general, some of the panels are so close together that keys or text for one panel look like they might belong to another. Increasing the white space would improve this issue. 

      Figure 3 has been adjusted as requested, Figure 7 has been split into two (Figure 7 and Figure 8) to make them more readable and to move data from the supplement to the main text as requested below.

      Fig1A. The control versus the KO images look so different that this figure fails to make the point that FM labeling is unaffected. The authors should consider substituting a better image for the control. It is not ideal to start off on a weak point in the first panel of the paper. 

      We agree and have updated Figure 1 accordingly.

      Fig1C. It is critical to state the stage here. Also P12? 

      scRNA-seq data are extracted from Matthew Kelley’s work and are a combination of P1, P12 and P100 utricular hair cells as following: Utricular hair cells were isolated by flow cytometry from 12- and 100-day old mice. Gene expression was then measured with scRNA-seq using the 10x platform. The data were then combined with a previously published single cell data set (samples from GSE71982) containing utricular hair cells isolated at P1. This dataset shows gene expression in immature vs mature utricular hair cells. The immature hair cells consist of a mixture of type I and type II cells.

      Fig1D. This schematic is confusing. The WT and KO labels are misplaced and the difference between gene and protein diagrams is not apparent. Maybe using a different bar diagram for the protein or at least adding 'aa' to the protein diagrams would be helpful. 

      Sorry for the confusion. We have revised panel 1D to address these concerns.

      Fig1E. Would be good to add 'mRNA' below the graph. 

      Done. We have added “mRNA fold change on the Y-axis” label.

      Fig2C and D. Why use such a late-stage P18 for the immunohistochemistry? 

      Data presented in panel 2C are from P5 explants kept 2 days in vitro. For panel 2D, P18 is relevant since ABR were performed at P16 and hair cell degeneration in CIB2 mutants as previously described occurs around P18-P21.

      Fig3A. Why isn't the cib2-/- genotype shown? 

      Data on cib2-/- mutant mice have already been published and no vestibular deficits have been found. See Giese et al., 2017 and Michel et al., 2017

      Fig3F. Does this pertain to the open field testing? It would make sense for this panel to be associated with those first panels. 

      Figure 3 has been updated as requested. 

      Fig4A. Which vestibular end organ? Are these ampullary cells? (Same question for 4B.) The statement in the text about 'indistinguishable' hair bundles is not supported by these panels. There appears to be an obvious difference here--the hair bundles look splayed in the double KO. Either the magnification of the images is not the same or the base of the bundles is wider in the double KO as well. This morphology appears to be at odds with results reported by Wang et al., 2023. 

      The vestibular end organs shown in Figure 4A are ampullae. Magnifications are consistent across all the panels. While reviewer might be right regarding the hair bundle morphology, SEM data would be the best approach to address this point. Unfortunately, we currently do not have such data and we believe that only vestibular hair loss can be addressed using IF images. Thus, we are only commenting on the absence of obvious vestibular haircell loss in the double KO mutants.

      Fig4C. To support the claim that extrastriolar hair cells in the Cib3-/- mice are less labeled with FM dye it would be necessary to at least indicate the two zones but also to quantify the fluorescence. One can imagine that labeling is quite variable due to differences in IP injection.

      The two zones have been outlined in Figure 4C as requested.

      Fig5. Strangely the authors dedicate a third of Figure 1 to describing the mouse KO of Cib3, yet no information is given about the zebrafish CRISPR alleles generated for this study. There is nothing in the results text or in this figure. At least one schematic could be added to introduce the fish alleles and another panel of gEAR information about cib2 and cib3 expression to help explain the neuromast data as was done in Fig1C.

      We have added a supplemental figure (Figure 5-figure Supplement 1) that outlines where the zebrafish cib2 and cib3 mutations are located. We also state in the results additional information regarding these lesions. In addition, we provide context for examining cib2/3 in zebrafish hair cells by referencing published data from inner ear and lateral line scRNAseq data in the results section.

      Absolutely nitpicky here, but the arrow in 5H may be confused for a mechanical stimulus.

      The arrow in 5H has been changed to a dashed line.

      Why not include the data from the supplemental figure at the end of this figure? 

      The calcium imaging data in the supplement could be included in the main figure but it would make for a massive figure. In eLife supplements can be viewed quite easily online, next to the main figures.

      Fig6. The ampullary hair bundles look thinner in 6I. Is this also the case for double KO neuromast bundles? Such data support the findings of Wang et al., 2023.

      We did not quantify the width of the hair bundles in the crista or neuromast. It is possible that the bundles are indeed thinner similar to Wang et al 2023.

      Fig7A. IL1 should be indicated in this panel. 

      IL1 has been indicated, as suggested.

      Fig7 supp 12. Color coding of the subunits would be appreciated here. 

      Done as requested.

      Fig7. Overall the supplemental data for Figure 7 is quite extensive and the significance of this data is underappreciated. The authors could consider pushing panel C to supplemental as it is a second method to confirm the modeling interactions and instead highlight the dimer models which are more relevant than the monomer structures. Also, I find the additional alpha 0 helix quite interesting because it is not seen in the C. elegans cryoEM structure. Panel G should be given more importance instead of positioned deep into the figure next to the salt bridges in F. Overall, the novelty and significance of the modeling data deserves more importance in the paper. 

      We thank the reviewer for these helpful suggestions. The amphipathic alpha 0 helix is present in the C. elegans cryo-EM structure, although it is named differently in their paper (Jeong et al., 2022). We have now clarified this in the text: “Our new models feature an additional amphipathic helix, which we denote a0, extending almost parallel to the expected plane of the membrane bilayer without crossing towards the extracellular side (as observed for a mostly hydrophobic a0 in OSCA channels and labeled as H3 in the worm TMC-1 structure) …”. In addition, we have modified Figure 7 and highlighted panel G in a separate Figure 8 as requested.

    2. Reviewer #1 (Public Review):

      Revised Public Review

      This reviewed preprint is essentially three papers combined into one-one paper focused on the role of CIB2/CIB3 in vestibular hair cells, one on the role of CIB2/CIB3 in zebrafish, and one on structural modeling of a CIB2/3 and TMC1/2 complex. The authors try to combine the three parts with the overarching theme of demonstrating that CIB2/3 play a functionally conserved role across species and hair cell types. It is important to note that many of the basic results from the mouse have already been reported by other groups in Liang et al. (2021) and Wang et al. (2023).

      That said, their demonstration of the importance of CIB2 and CIB3 in zebrafish hair cell function is novel. The results largely coincide with what is seen in the mouse-they are both important, with stimulus-dependent Ca2+ entry reduced more in cib2 KOs than in cib3 KOs, and the cib2;cib3 showing the greatest impact. Interestingly, cib2 is uniquely localized in and important for specific hair cell types in the neuromast and crista.

      The last part of the manuscript also offers significant new findings. Here structural studies (AlphaFold 2 modeling, NMR structure determination, and molecular dynamics simulations) brings us closer to the structure of the mammalian TMCs, alone and in complex with the CIB proteins. Moreover, the structural work supports the assignment of the TMC pore to alpha helices 4-7.

      In summary, while this reviewed preprint has some data that replicate data from publications from other labs, it provides a comprehensive look at the CIB family in hair cells, especially in vestibular hair cells.

    3. Reviewer #2 (Public Review):

      The paper by Giese and coworkers is quite an intense reading. The manuscript is packed with data pertaining to very different aspects of MET apparatus function, scales, and events. I have to praise the team that combined molecular genetics, biochemistry, NMR, microscopy, functional physiology, in-vivo tests for vestibulo-ocular reflexes, and other tests for vestibular dysfunction with molecular modeling and simulations. The authors nicely show the way CIBs are associated with TMCs to form functional MET channels. The authors clarify the specificity of associations and elucidate the functional effects of the absence of specific CIBs and their partial redundancy.

      Comments on revised version:

      I appreciate the author's effort to address my comments. The revised paper 'Complexes of vertebrate TMC1/2 and CIB2/3 proteins 1 form hair-cell mechanotransduction cation channels' by Giese and coworkers is definitely cleaner but remains a compendium of related but very uneven parts. By saying 'uneven,' I mean that the grounding of the experimental and computational parts is different, and the firmness of conclusions, respectively, is not matched.

      My conclusion is that this is a great collaborative project. However, in its present form, different components pull the emphasis in several directions with little cross-talk. It is worth splitting into two papers.

    1. eLife assessment

      This work presents a valuable mouse model for a liver-specific depletion of the Survival Motor Neuron (SMN) protein, where the liver retains 30% of functional full-length SMN protein. The authors provide a profile of phenotypic changes in liver-specific SMN depleted mice: while evidence supporting their claims are generally solid, the phenotype is mild and mechanistic understanding remains to be determined.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents a comprehensive exploration of the role of liver-specific Survival Motor Neuron (SMN) depletion in peripheral and central nervous system tissue pathology through a well-constructed mouse model. This study is pioneering in its approach, focusing on the broader physiological implications of SMN, which has traditionally been associated predominantly with spinal muscular atrophy (SMA).

      Strengths:

      (1) Novelty and Relevance: The study addresses a significant gap in understanding the role of liver-specific SMN depletion in the context of SMA. This is a novel approach that adds valuable insights into the multi-organ impact of SMN deficiency.

      (2) Comprehensive Methodology: The use of a well-characterized mouse model with liver-specific SMN depletion is a strength. The study employs a robust set of techniques, including genetic engineering, histological analysis, and various biochemical assays.

      (3) Detailed Analysis: The manuscript provides a thorough analysis of liver pathology and its potential systemic effects, particularly on the pancreas and glucose metabolism.

      (4) Clear Presentation: The manuscript is well written. The results are presented clearly with well-designed figures and detailed legends.

      Weaknesses:

      (1) Limited Time Points: The study primarily focuses on a single time point (P19). This limits the understanding of the temporal progression of liver and pancreatic pathology in the context of SMN depletion. Longitudinal studies would provide a better understanding of disease progression.

      (2) Incomplete Recombination: The mosaic pattern of Cre-mediated excision leads to variability in SMN depletion, which complicates the interpretation of some results. Ensuring more consistent recombination across samples would strengthen the conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      Marylin Alves de Almeida et al. developed a novel mouse cross via conditionally depleting functional SMN protein in the liver (AlbCre/+;Smn2B/F7). This mouse model retains a proportion of SMN in the liver, which better recapitulates SMN deficiency observed in SMA patients and allows further investigation into liver-specific SMN deficiency and its systemic impact. They show that AlbCre/+;Smn2B/F7 mice do not develop an apparent SMA phenotype as mice did not develop motor neuron death, neuromuscular pathology or muscle atrophy, which is observed in the Smn2B/- controls. Nonetheless, at P19, these mice develop mild liver steatosis, and interestingly, this conditional depletion of SMN in the liver impacts cells in the pancreas.

      Strengths:

      The current model has clearly delineated the apparent metabolic perturbations which involve a significantly increased lipid accumulation in the liver and pancreatic cell defects in AlbCre/+;Smn2B/F7 mice at P19. Standard methods like H&E and Oil Red-O staining show that in AlbCre/+;Smn2B/F7 mice, their livers closely mimic the livers of Smn2B/- mice, which have the full body knockout of SMN protein. Unlike previous work, this liver-specific conditional depletion of SMN is superior in that it is not lethal to the mouse, which allows an opportunity to investigate the long-term effects of liver-specific SMN on the pathology of SMA.

      Weaknesses: Given that SMA often involves fatty liver, dyslipidemia and insulin resistance, using the current mouse model, the authors could have explored the long-term effects of liver-specific depletion of SMN on metabolic phenotypes beyond P19, as well as systemic effects like glucose homeostasis. Given that the authors also report pancreatic cell defects, the long-term effect on insulin secretion and resistance could be further explored. The mechanistic link between a liver-specific SMN depletion and apparent pancreatic cell defects is also unclear.

      Discussion:

      This current work explores a novel mouse cross in order to specifically deplete liver SMN using an Albumin-Cre driver line. This provides insight into the contribution of liver-specific SMN protein to the pathology of SMA, which is relevant for understanding metabolic perturbations in SMA patients. Nonetheless, given that SMA in patients involve a systemic deletion or mutation of the SMN gene, the authors could emphasize the utility of this liver-specific mouse model, as opposed to using in vitro models, which have been recently reported (Leow et al, 2024, JCI). Authors should also discuss why a mild metabolic phenotype is observed in this current mouse model, as opposed to other SMA mouse models described in literature.

    4. Author response:

      We will address all the textual suggestions, including rectifying any typos and incorporating the most recent literature.

      We will conduct longitudinal studies to determine whether the phenotype worsens or improves over time in liver-specific SMN-depleted mice. In this regard, we will present data from P60 animals, such as histological analyses for the characterization of the liver and pancreas.

    1. eLife assessment

      This important work has substantially advanced our understanding of the molecular basis of symmetry breaking and lineage specification in preimplantation mammalian embryos. The results generated using live imaging are compelling. Quantification of the functional assays is convincing and would be improved by increasing the number of embryos in the evaluations and clearly stating how many embryos are evaluated per experiment.

    2. Reviewer #1 (Public review):

      Summary:

      This work starts with the observation that embryo polarization is asynchronous starting at the early 8-cell stage, with early polarizing cells being biased towards producing the trophectoderm (TE) lineage. They further found that reduced CARM1 activity and upregulation of its substrate BAF155 promote early polarization and TE specification, this piece of evidence connects the previous finding that at Carm1 heterogeneity 4-cell stage guide later cell lineages - the higher Carm1-expressing blastomeres are biased towards ICM lineage. Thus, This work provides a link between asymmetries at the 4-cell stage and polarization at the 8-cell stage, providing a cohesive explanation regarding the first lineage allocation in mouse embryos.

      Strengths:

      In addition to what has been put in the summary, the advanced 3D image-based analysis has found that early polarization is associated with a change in cell geometry in blastomeres, regarding the ratio of the long axis to the short axis. This is considered a new observation that has not been identified.

      Weaknesses:

      For the microinjection-based method to overexpression/deletion of proteins, although it has been shown to be effective in the early embryo settings and has been widely used, it may not fully represent the in vivo situation in some cases, compared to other strategies such as the use of knock-in mice. This is a minor weakness; it would be good to include some sentences in the discussion on the potential caveats.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Lamba and colleagues suggest a molecular mechanism to explain cell heterogeneity in cell specification during pre-implantation development. They show that embryo polarization is asynchronous. They propose that reduced CARM1 activity and upregulation of its substrate BAF155 promote early polarization and trophectoderm specification.

      Strengths:

      The authors use appropriate and validated methodology to address their scientific questions. They also report excellent live imaging. Most of the data are accompanied by careful quantifications.

      Weaknesses:

      I think this manuscript requires some more quantification, increased number of embryos in their evaluations and clearly stating the number of embryos evaluated per experiments.

      Here are some points:

      (1) It should be clearly stated in all figure legends and in the text how many cells from how many embryos were analyzed.

      (2) I think that the number of embryos sometimes are too low. These are mouse embryos easily accessible and the methods used are well established in this lab, so the authors should make an effort to have at least 10/15 embryos per experiment. For example "In agreement with this, hybridization chain reaction (HCR) RNA fluorescence in situ hybridization of early 8-cell stage embryos revealed that the number of CDX2 mRNA puncta was higher in polarized blastomeres with a PARD6-positive apical domain than in unpolarized blastomeres, for 5 out of 6 embryos with EP cells (Figure 3A, B)".. or the data for Figure 4, we know how many cells but now how many embryos.

      (3) It would be useful to see in Figure 4 an example of asymmetric cell division as done for symmetric cell division in panel 4B. This could really help the reader to understand how the authors assessed this.

      (4) Figure 5C there is a big disproportion of the number of EP and LP identified. Could the authors increase the number of embryos quantified and see if they can increase EP numbers?

      (5) Could the authors give more details about how they mount the embryos for live imaging? With agarose or another technique? In which dishes? Overlaid with how much medium and oil? This could help other labs that want to replicate the live imaging in their labs. Also, was it a z-stack analysis? If yes, how many um per stack? Ideally, if they also know the laser power used (at least a range) it would be extremely useful.

    1. eLife assessment

      This important study explores the impact of pH changes and cancer mutations on nucleosome interactions and higher-order chromatin structures. The evidence supporting the main conclusions is solid, based on rigorous computational methods, including pKa prediction, electrostatic force calculation, and molecular dynamics simulations. The findings provide insights into how protonation states and cancer-associated mutations affect nucleosome electrostatics and chromatin organization, making this work of broad interest to chromatin biologists, cancer researchers, and computational biophysicists.

    2. Reviewer #1 (Public review):

      Summary:

      This is a valuable study probing the impact of pH and cancer mutations on nucleosome interactions and higher-order chromatin structures.

      Strengths:

      The study is comprehensive, covering all the titratable residues of nucleosomes and all known cancer mutations. The analysis was rigorously carried out within the feasibility of current computational capabilities. The methods used in this study are also solid. The results of this study can enhance our understanding of higher-order chromatin organizations and their modulation by various genetic and epigenetic changes.

      Weaknesses:

      The interpretation and illustration of the data need improvement, such as the change of protonation states of titratable residues on the nucleosome-protein interactions and higher-order chromatin structures.

    3. Reviewer #2 (Public review):

      Summary:

      The paper by Zhang et al. has two parts.

      The first one presents a comprehensive study of the nucleosome pKs, including their shifts from reference values in solution. They also explore changes in the protonation states of the histone residue in response to the formation of various nucleosome complexes, including higher-order nucleosome structures. The overall conclusion is that pH-induced changes in histone residue protonation states modulate nucleosome surface electrostatic potentials, and influence nucleosome-partner protein interactions. Proton uptake or release often accompanied by nucleosome-partner protein interactions affects their binding processes.

      In the second part, the authors study the effect of 1266 recurrent histone cancer mutations on the nucleosome surface electrostatics: they show a significant subset of these has a major effect on the nucleosome-partner interactions, with the potential to regulate nucleosome self-association, thereby affecting higher-order chromatin structures.

      Strengths:

      The main strengths of this work are its technical rigor, comprehensive nature, and novelty of several of its aspects. For example, I am not aware of another work that analyzed pK shifts in the nucleosome in such level of detail, and on for so many different structures. The same for pK shifts upon nucleosome-partner binding. The analysis of pK shifts in nucleosome-nucleosome binding is likely completely new. The authors use an established methodology, check it against experiment at least in some instances, and, very importantly, base their conclusions on many different structures. The specific pK-related numbers they report are believable.

      Regarding the second part of the work: the specific connection made between a subset of cancer-associated mutations and the major electrostatic changes in the nucleosome is novel and should be of interest to a broad community. The authors conclude that cancer mutations can also regulate nucleosome self-association, modulating the organization and dynamics of higher-order chromatin structures.

      The detailed and comprehensive analysis of the cancer-associated mutations, including their partitioning into multiple relevant categories, is of value in its own right.

      Weaknesses:

      The main weakness of the first (pK-related) part of this work is the lack of relevance to specific conditions in most living cells of higher eukaryotes. The problem is that the nucleosome resides in the nucleus, where the pH is very tightly controlled, and for good reasons. See e.g. Casey, J., Grinstein, S., and Orlowski, J. ``Sensors and regulators of intracellular pH." Nature Rev. Mol. Cell. Biology. (2009). Parker, M. D., and Boron, W. F. ``The divergence, actions, roles, and relatives of sodium-coupled bicarbonate transporters.", Physiol. Rev. (2013). While intracellular pH does deviate from about 7.2, the naturally occurring deviations are only of the order of 0.3 pH units. In that respect, what the authors call "physiological" range of 6.5 to 7.5 is still too broad, let alone the "slightly basic (pH 5 to 6.5) or ``slightly acidic" (pH 7.5 to 9) conditions, as defined by the authors. It is hard to imagine a situation where intra-nuclear pH changes from e.g. "slightly acidic" to neutral in a live cell nucleus.

      This said, there is nothing wrong with studying the response of the nucleosome structures to these large variations of pH, which can be reproduced in-vitro. It is the relevance of the findings to in-vivo conditions that are highly questionable.

      The second part of the work - the effect of cancer mutations - is free from this major defect. In the opinion of this reviewer, it can (and should) stand on its feet, as a separate work.

      However, the lack of specific, testable (preferably quantitative) biologically relevant predictions is a weakness of both parts. For example, in "Discussion" the authors state that "Histone ionizable residues are highly sensitive to cellular pH fluctuations, leading to changes in their protonation states and consequent alterations in nucleosome surface electrostatic potentials and interactions." This statement is certainly true, based on what is already known about the effect of pH on protein-DNA (or protein-protein) association, from previous works. But what are the specific predictions here?

    1. eLife assessment

      This valuable study investigates how inter-organ communication between the tracheal stem cells and the fat body plays a key role in the directed migration of tracheal stem cells in Drosophila pupae. While the experimental data are extensive and complementary, the evidence presented to substantiate some of the conclusions appears incomplete and requires further clarification and additional experiments. The work would be of interest to researchers in the fields of developmental biology and cancer biology.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Dong et al. study the directed cell migration of tracheal stem cells in Drosophila pupae. The migration of these cells which are found in two nearby groups of cells normally happens unidirectionally along the dorsal trunk towards the posterior. Here, the authors study how this directionality is regulated. They show that inter-organ communication between the tracheal stem cells and the nearby fat body plays a role. They provide compelling evidence that Upd2 production in the fat body and JAK/STAT activation in the tracheal stem cells play a role. Moreover, they show that JAK/STAT signalling might induce the expression of apicobasal and planar cell polarity genes in the tracheal stem cells which appear to be needed to ensure unidirectional migration. Finally, the authors suggest that trafficking and vesicular transport of Upd2 from the fat body towards the tracheal cells might be important.

      Strengths:

      The manuscript is well written. This novel work demonstrates a likely link between Upd2-JAK/STAT signalling in the fat body and tracheal stem cells and the control of unidirectional cell migration of tracheal stem cells. The authors show that hid+rpr or Upd2RNAi expression in a fat body or Dome RNAi, Hop RNAi, or STAT92E RNAi expression in tracheal stem cells results in aberrant migration of some of the tracheal stem cells towards the anterior. Using ChIP-seq as well as analysis of GFP-protein trap lines of planar cell polarity genes in combination with RNAi experiments, the authors show that STAT92E likely regulates the transcription of planar cell polarity genes and some apicobasal cell polarity genes in tracheal stem cells which appear to be needed for unidirectional migration. Moreover, the authors hypothesise that extracellular vesicle transport of Upd2 might be involved in this Upd2-JAK/STAT signalling in the fat body and tracheal stem cells, which, if true, would be quite interesting and novel.

      Overall, the work presented here provides some novel insights into the mechanism that ensures unidirectional migration of tracheal stem cells that prevents bidirectional migration. This might have important implications for other types of directed cell migration in invertebrates or vertebrates including cancer cell migration.

      Weaknesses:

      It remains unclear to what extent Upd2-JAK/STAT signalling regulates unidirectional migration. While there seems to be a consistent phenotype upon genetic manipulation of Upd2-JAK/STAT signalling and planar cell polarity genes, as in the aberrant anterior migration of a fraction of the cells, the phenotype seems to be rather mild, with the majority of cells migrating towards the posterior.

      While I am not an expert on extracellular vesicle transport, the data presented here regarding Upd2 being transported in extracellular vesicles do not appear to be very convincing.

      Major comments:

      (1) The graphs showing the quantification of anterior (and in some cases also posterior migration) are quite confusing. E.g. Figure 1F (and 5E and all others): These graphs are difficult to read because the quantification for the different conditions is not shown separately. E.g. what is the migration distance for Fj RNAi anterior at 3h in Fig5E? Around -205micron (green plus all the other colors) or around -70micron (just green, even though the green bar goes to -205micron). If it's -205micron, then the images in C' or D' do not seem to show this strong phenotype. If it's around -70, then the way the graph shows it is misleading, because some readers will interpret the result as -205.

      Moreover, it's also not clear what exactly was quantified and how it was quantified. The details are also not described in the methods. It would be useful, to mark with two arrowheads in the image (e.g. 5 A' -D') where the migration distance is measured (anterior margin and point zero).

      Overall, it would be better, if the graph showed the different conditions separately. Also, n numbers should be shown in the figure legend for all graphs.

      (2) Figure 2-figure supplement 1: C-L and M: From these images and graph it appears that Upd2 RNAi results in no aberrant anterior migration. Why is this result different from Figures 2D-F where it does?

      (3) Figure 5F: The data on the localisation of planar cell polarity proteins in the tracheal stem cell group is rather weak. Figure 5G and J should at least be quantified for several animals of the same age for each genotype. Is there overall more Ft-GFP in the cells on the posterior end of the cell group than on the opposite side? Or is there a more classic planar cell polarity in each cell with Ft-GFP facing to the posterior side of the cell in each cell? Maybe it would be more convincing if the authors assessed what the subcellular localisation of Ft is through the expression of Ft-GFP in clones to figure out whether it localises posteriorly or anteriorly in individual cells.

      (4) Regarding the trafficking of Upd2 in the fat body, is it known, whether Grasp65, Lbm, Rab5, and 7 are specifically needed for extracellular vesicle trafficking rather than general intracellular trafficking? What is the evidence for this?

      (5) Figure 8A-B: The data on the proximity of Rab5 and 7 to the Upd2 blobs are not very convincing.

      (6) The authors should clarify whether or not their work has shown that "vesicle-mediated transport of ligands is essential for JAK/STAT signaling". In its current form, this manuscript does not appear to provide enough evidence for extracellular vesicle transport of Upd2.

      (7) What is the long-term effect of the various genetic manipulations on migration? The authors don't show what the phenotype at later time points would be, regarding the longer-term migration behaviour (e.g. at 10h APF when the cells should normally reach the posterior end of the pupa). And what is the overall effect of the aberrant bidirectional migration phenotype on tracheal remodelling?

      (8) The RNAi experiments in this manuscript are generally done using a single RNAi line. To rule out off-target effects, it would be important to use two non-overlapping RNAi lines for each gene.

    3. Reviewer #2 (Public review):

      Summary:

      This work by Dong and colleagues investigates the directed migration of tracheal stem cells in Drosophila pupae, essential for tissue homeostasis. These cells, found in two nearby groups, migrate unidirectionally along the dorsal trunk towards the posterior to replenish degenerating branches that disperse the FGF mitogen. The authors show that inter-organ communication between tracheal stem cells and the neighboring fat body controls this directionality. They propose that the fat body-derived cytokine Upd2 induces JAK/STAT signaling in tracheal progenitors, maintaining their directional migration. Disruption of Upd2 production or JAK/STAT signaling results in erratic, bidirectional migration. Additionally, JAK/STAT signaling promotes the expression of planar cell polarity genes, leading to asymmetric localization of Fat in progenitor cells. The study also indicates that Upd2 transport depends on Rab5- and Rab7-mediated endocytic sorting and Lbm-dependent vesicle trafficking. This research addresses inter-organ communication and vesicular transport in the disciplined migration of tracheal progenitors.

      Strengths:

      This manuscript presents extensive and varied experimental data to show a link between Upd2-JAK/STAT signaling and tracheal progenitor cell migration. The authors provide convincing evidence that the fat body, located near the trachea, secretes vesicles containing the Upd2 cytokine. These vesicles reach tracheal progenitors and activate the JAK-STAT pathway, which is necessary for their polarized migration. Using ChIP-seq, GFP-protein trap lines of planar cell polarity genes, and RNAi experiments, the authors demonstrate that STAT92E likely regulates the transcription of planar cell polarity genes and some apicobasal cell polarity genes in tracheal stem cells, which seem to be necessary for unidirectional migration.

      Weaknesses:

      Directional migration of tracheal progenitors is only partially compromised, with some cells migrating anteriorly and others maintaining their posterior migration.<br /> Additionally, the authors do not examine the potential phenotypic consequences of this defective migration.

      It is not clear whether the number of tracheal progenitors remains unchanged in the different genetic conditions. If there are more cells, this could affect their localization rather than migration and may change the proposed interpretation of the data.

      Upd2 transport by vesicles is not convincingly shown.

      Data presentation is confusing and incomplete.

    4. Reviewer #3 (Public review):

      Summary:

      Dong et al tackle the mechanism leading to polarized migration of tracheal progenitors during Drosophila metamorphosis. This work fits in the stem cell research field and its crucial role in growth and regeneration. While it has been previously reported by others that tracheal progenitors migrate in response to FGF and Insulin signals emanating from the fat body in order to regenerate tracheal branches, the authors identified an additional mechanism involved in the communication of the fat body and tracheal progenitors.

      Strengths:

      The data presented were obtained using a wide range of complementary techniques combining genetics, molecular biology, quantitative, and live imaging techniques. The authors provide convincing evidence that the fat body, found in close proximity to the trachea, secrete vesicles containing the Upd2 cytokine that reach tracheal progenitors leading to JAK-STAT pathway activation, which is required for their polarized migration. In addition, the authors show that genes regulating planar cell polarity are also involved in this inter-organ communication.

      Weaknesses:

      (1) Affecting this inter-organ communication leads to a quite discrete phenotype where polarized migration of tracheal progenitors is partially compromised. The study lacks data showing the consequences of this phenotype on the final trachea morphology, function, and/or regeneration capacities at later pupal and adult stages. This could potentially increase the significance of the findings.

      (2) The conclusions of this paper are mostly well supported by data, but some aspects of data acquisition and analysis need to be clarified and corrected, such as recurrent errors in plotting of tracheal progenitor migration distance that mislead the reader regarding the severity of the phenotype.

      (3) The number of tracheal progenitors should be assessed since they seem to be found in excess in some genetic conditions that affect their behavior. A change in progenitor number could lead to crowding, thus affecting their localization rather than migration capacities, thereby changing the proposed interpretation. In addition, the authors show data suggesting a reduced progenitor migration speed when the fat body is affected, which would also be consistent with a crowding of progenitors.

      (4) The authors claim that tracheal progenitors display a polarized distribution of PCP proteins that is controlled by JAK-STAT signaling. However, this conclusion is made from a single experiment that is not quantified and for which there is no explanation of how the plot profile measurements were performed. It also seems that this experiment was done only once. Altogether, this is insufficient to support the claim. Finally, a quantification of the number of posterior edges presenting filopodia rather than the number of filopodia at the anterior and posterior leading edges would be more appropriate.

      (5) The authors demonstrate that Upd2 is transported through vesicles from the fat body to the tracheal progenitors where they propose they are internalized. Since the Upd2 receptor Dome ligand binding sites are exposed to the extracellular environment, it is difficult to envision in the proposed model how Upd2 would be released from vesicles to bind Dome extracellularly and activate the JAK-STAT pathway. Moreover, data regarding the mechanism of the vesicular transport of Upd2 are not fully convincing since the PLA experiments between Upd2 and Rab5, Rab7, and Lbm are not supported by proper positive and negative controls and co-immunoprecipitation data in the main figure do not always correlate to the raw data.

    1. eLife assessment

      This study presents valuable findings on the relative cerebral blood volume of non-human primates that move us closer to uncovering the functional and architectonic principles that govern the interplay between neuronal and vascular networks. The evidence of areal variations is solid, but that of vessel counting and laminar analysis is incomplete. The lack of a direct comparison of their approach against better-established MRI-based methods for measuring hemodynamics and vascular structure weakens the evidence provided in the current paper version. The work will be of interest to NHP imaging scientists.

    2. Reviewer #1 (Public review):

      Summary:

      Audio et al. measured cerebral blood volume (CBV) across cortical areas and layers using high-resolution MRI with contrast agents in non-human primates. While the non-invasive CBV MRI methodology is often used to enhance fMRI sensitivity in NHPs, its application for baseline CBV measurement is rare due to the complexities of susceptibility contrast mechanisms. The authors determined the number of large vessels and the areal and laminar variations of CBV in NHP and compared those with various other metrics.

      Strengths:

      Non-invasive mapping of relative cerebral blood volume is novel for non-human primates. A key finding was the observation of variations in CBV across regions; primary sensory cortices had high CBV, whereas other higher areas had low CBV. The measured CBV values correlated with previously reported neuronal and receptor densities.

      Weaknesses:

      A weakness of this manuscript is that the quantification of CBV with postprocessing approaches to remove susceptibility effects from pial and penetrating vessels, as well as orientation dependency, is not fully validated, especially on a laminar scale. Further specific comments follow.

      (1) Baseline CBV indices were determined using contrast agent-enhanced MRI (deltaR2*). Although this approach is suitable for areal comparisons, its application on a laminar scale has not been validated in the literature or in this study. By comparing with histological vascular information of V1, the authors attempted to validate their approach. However, the generalization of their method is questionable. The main issue is whether the large vessel contribution is minimized by processing approaches properly in various cortical areas (such as clusters 1-3 in Figure 5). It would be beneficial to compare deltaR2* with deltaR2 induced by contrast agents in a few selected slices, as deltaR2 is supposed to be sensitive to microvessels, not macrovessels. Please discuss this issue.

      (2) High-resolution MRI with a critical sampling frequency estimated from previous studies (Weber 2008, Zheng 1991) was performed to separate penetrating vessels, which is considered one of the major advancements in this study. However, this approach is still insufficient to accurately identify the number of vessels due to the blooming effects of susceptibility and insufficient spatial resolution. There was no detailed description of the detection criteria. More importantly, the number of observable penetrating vessels is dependent on imaging parameters and the dose of the contrast agent. If imaging slices were obtained in parallel to the cortex with higher in-plane resolution, it would likely improve the detection of penetrating vessels. Using higher-field MRI would further enhance the detection of penetrating vessels. Therefore, the reported value is only applicable to the experimental and processing conditions used in this study. Detailed selection criteria should be mentioned, and all potential pitfalls should be discussed.

      (3) Attempts to obtain pial vascular structures were made (Figure 2). As mentioned in this manuscript, the blooming effect of susceptibility contrasts is problematic. In the MRI community, T1-based Gd contrast agents have been used for mapping large vasculature, which is a better approach for obtaining pial vascular structures. Alternatively, computer tomography with a blood contrast agent can be used for mapping blood vasculature noninvasively. This issue should be discussed.

      (4) Since baseline R2* is related to baseline R2, vascular volume, iron content, and susceptibility gradients, it is difficult to correlate it with physiological parameters. Baseline R2* is also sensitive to imaging parameters; higher spatial resolution tends to result in lower R2* values (closer to the R2 value). Therefore, baseline R2* findings need to be emphasized.

      (5) CBV-weighted deltaR2* is correlated with various other metrics (cytoarchitectural parcellation, myelin/receptor density, cortical thickness, CO, cell-type specificity, etc.). While testing the correlation between deltaR2* and these other metrics may be acceptable as an exploratory analysis, it is challenging for readers to discern a causal relationship between them. A critical question is whether CBV-weighted deltaR2* can provide insights into other metrics in diseased or abnormal brain states. If this is the case, then high-resolution deltaR2* will be useful. Please comment on this possibility.

      (6) There is no discussion about the deltaR2* difference across subcortical areas (Figure 1). This finding is intriguing and warrants a thorough discussion in the context of the cortical findings.

      (7) Figure 3 is missing. Several statements in the manuscript require statistics (e.g., bimodality in Figure 2D, Figure 3F).

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a new approach for non-invasive, MRI-based measurements of cerebral blood volume (CBV). Here, the authors use ferumoxytol, a high-contrast agent, and apply specific sequences to infer CBV. The authors then move to statistically compare measured regional CBV with the known distribution of different types of neurons, markers of metabolic load, and others. While the presented methodology captures an estimated 30% of the vasculature, the authors corroborated previous findings regarding the lack of vascular compartmentalization around functional neuronal units in the primary visual cortex.

      Strengths:

      Non-invasive methodology geared to map vascular properties in vivo.

      Implementation of a highly sensitive approach for measuring blood volume.

      Ability to map vascular structural and functional vascular metrics to other types of published data.

      Weaknesses:

      The key issue here is the underlying assumption about the appropriate spatial sampling frequency needed to capture the architecture of the brain vasculature. Namely, ~7 penetrating vessels / mm2 as derived from Weber et al 2008 (Cer Cor). The cited work begins by characterizing the spacing of penetrating arteries and ascending veins using a vascular cast of 7 monkeys (Macaca mulatta, same as in the current paper). The ~7 penetrating vessels / mm2 are computed by dividing the total number of identified vessels by the area imaged. The problem here is that all measurements were made in a "non-volumetric" manner and only in V1. Extrapolating from here to the entire brain seems like an over-assumption, particularly given the region-dependent heterogeneity that the current paper reports.

    1. eLife assessment

      In this elegant and thorough study, Sánchez-León et al. investigate the effects of tDCS on the firing of single cerebellar neurons in awake and anesthetized mice. They find heterogeneous responses depending on the orientation of the recorded Purkinje cell. The paper is important in that it may well explain part of the controversial and ambiguous outcomes of various clinical trials. It is a well-written paper on a deeply analyzed dataset and the methods in the paper are generally convincing, with the current version having some weaknesses in statistical reporting and power.

    2. Reviewer #1 (Public review):

      Summary:

      In this elegant and thorough study, Sánchez-León et al. investigate the effects of tDCS on the firing of single cerebellar neurons in awake and anesthetized mice. They find heterogeneous responses depending on the orientation of the recorded Purkinje cell.

      Strengths:

      The paper is important in that it may well explain part of the controversial and ambiguous outcomes of various clinical trials. It is a well-written paper on a deeply analyzed dataset.

      Weaknesses:

      The sample size could be increased for some of the experiments.

    3. Reviewer #2 (Public review):

      Summary:

      In this study by Sánchez-León and colleagues, the authors attempted to determine the influence of neuronal orientation on the efficacy of cerebellar tDCS in modulating neural activity. To do this, the authors made recordings from Purkinje cells, the primary output neurons of the cerebellar cortex, and determined the inter-dependency between the orientation of these cells and the changes in their firing rate during cerebellar tDCS application.

      Strengths:

      (1) A major strength is the in vivo nature of this study. Being able to simultaneously record neural activity and apply exogenous electrical current to the brain during both an anesthetized state and during wakefulness in these animals provides important insight into the physiological underpinnings of tDCS.

      (2) The authors provide evidence that tDCS can modulate neural activity in multiple cell types. For example, there is a similar pattern of modulation in Purkinje cells and non-Purkinje cells (excitatory and inhibitory interneurons). Together, these data provide wholistic insight into how tDCS can affect activity across different populations of cells, which has important implications for basic neuroscience, but also clinical populations where there may be non-uniform or staged effects of neurological disease on these various cell types.

      (3) There is a systematic investigation into the effects of tDCS on neural activity across multiple regions of the cerebellum. The authors demonstrate that the pattern of modulation is dependent on the target region. These findings have important implications for determining the expected neuromodulatory effects of tDCS when applying this technique over different target regions non-invasively in animals and humans.

      Weaknesses:

      (1) In the introduction, there is a lack of context regarding why neuronal orientation might be a critical factor influencing the responsiveness to tDCS. The authors allude to in vitro studies that have shown neuronal orientation to be relevant for the effects of tDCS on neural activity but do not expand on why this might be the case. These points could be better understood by informing the reader about the uniformity/non-uniformity of the induced electric field by tDCS. In addition, there is a lack of an a priori hypothesis. For example, would the authors have expected that neuronal orientation parallel or perpendicular to the electrical field to be related to the effects of tDCS on neural activity?

      (2) It is unclear how specific stimulation parameters were determined. First, how were the tDCS intensities used in the present experiments determined/selected, and how does the relative strength of this induced electric field equate to the intensities used non-invasively during tDCS experiments in humans? Second, there is also a fundamental difference in the pattern of application used here (e.g., 15 s pulses separated by 10 s of no stimulation) compared to human studies (e.g., 10-20 min of constant stimulation).

      (3) In their first experiment, the authors measure the electric field strength at increasing depths during increasing stimulation intensities. However, it appears that an alternating current rather than a direct current, which is usually employed in tDCS protocols, was used. There is a lack of rationale regarding why the alternating current was used for this component. Typically, this technique is more commonly used for entraining/boosting neural oscillations compared to studies using tDCS which aim to increase or decrease neural activity in general.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Sanchez-Leon et al. combined extracellular recordings of Purkinje cell activity in awake and anesthetized mice with juxtacellular recordings and Purkinje cell staining to link Purkinje cell orientation to their stimulation response. The authors find a relationship between neuron orientation and firing rate, dependent on stimulation type (anodal/cathodal). They also show the effects of stimulation intensity and rebound effects.

      Strengths:

      Overall, the work is methodologically sound and the manuscript is well written. The authors have taken great care to explain their rationale and methodological choices.

      Weaknesses:

      My only reservation is the lack of reporting of the precise test statistics, p-values, and multiple comparison corrections. The work would benefit from adding this and other information.

      Major Comments:

      (1) The authors should report the exact test statistics. These are missing for all comparisons and hinder the reader from understanding what exactly was tested for each of the experiments. For example, having the exact test statistics would help better understand the non-significant differences in Figure 1h where there is at least a numeric difference in CS firing rate during tDCS.

      (2) Did the authors apply any corrections for multiple comparisons? Generally, it would be helpful if they could clarify the statistical analysis (which values were subjected to the tests, how many tests were performed for each question, etc.).

      (3) The relationship shown in Figure 2g seems to be influenced by the two outliers. Have the authors confirmed the results using a robust linear regression method?

      (4) The authors conclude that tDCS modulates vermal PCs more than Crus I/II PCs - but they don't seem to test this statistically. It would be helpful to submit the firing rate change values to an actual statistical test to conclude this directly from the data

    1. eLife assessment

      This work presents valuable findings of a modulatory effect of yohimbine, an alpha2-adrenergic antagonist that raises noradrenaline levels, on the reconsolidation of emotionally neutral word-picture pairs, depending on the hippocampal and cortical reactivation during retrieval. The evidence supporting the conclusion is incomplete so far, particularly considering concerns about the median-splitting approach for reaction times and hippocampal activity. The work will be of broad interest to researchers working on memory.

    2. Reviewer #1 (Public review):

      Summary:

      How reconsolidation works - particularly in humans - remains largely unknown. With an elegant, 3-day design, combining fMRI and psychopharmacology, the authors provide evidence for a certain role for noradrenaline in the reconsolidation of memory for neutral stimuli. All memory tasks were performed in the context of fMRI scanning, with additional resting-state acquisitions performed before and after recall testing on Day 2. On Day 1, 3 groups of healthy participants encoded word-picture associates (with pictures being either scenes or objects) and then performed an immediate cued recall task to presentation of the word (answering is the word old or new, and whether it was paired with a scene or an object). On Day 2, the cued recall task was repeated using half of the stimulus set words encoded on Day 1 (only old words were presented, with subjects required to indicate prior scene vs object pairing). This test was immediately preceded by the oral administration of placebo, cortisol, or yohimbine (to raise noradrenaline levels) depending on group assignment. On Day 3, all words presented on Day 1 were presented. As expected, on Day 3, memory was significantly enhanced for associations that were cued and successfully retrieved on Day 2 compared to uncued associations. However, for associative d', there was no Cued × Group interaction nor a main effect of Group, i.e., on the standard measure of memory performance, post-retrieval drug presence on Day 2 did not affect memory reconsolidation. As further evidence for a null result, fMRI univariate analyses showed no Cued × Group interactions in whole-brain or ROI activity.

      Strengths:

      There are some aspects of this study that I find impressive. The study is well-designed and the fMRI analysis methodology is innovative and sound. The authors have made meticulous and thorough physiological measurements, and assays of mood, throughout the experiment. By doing so, they have overcome, to a considerable extent, the difficulties inherent in the timing of human oral drug delivery in reconsolidation tasks, where it is difficult to have the drug present in the immediate recall period without affecting recall itself. This is beautifully shown in Figure 3. I also think that having some neurobiological assay of memory reactivation when studying reconsolidation in humans is critical, and the authors provide this. While multi-voxel patterns of hemodynamic responses are, in my view, very difficult to equate with an "engram", these patterns do have something to do with memory.

      Weaknesses:

      I have major issues regarding the behavioral results and the framing of the manuscript.

      (1) To arrive at group differences in memory performance, the authors performed median splitting of Day 3 trials by short and long reaction times during memory cueing on Day 2, as they took this as a putative measure of high/low levels of memory reactivation. Associative category hits on Day 3 showed a Group by Day 2 Reaction time (short, long) interaction, with post-hocs showing (according to the text) worse memory for short Day 2 RTs in the Yohimbine group. These post-hocs should be corrected for multiple comparisons, as the result is not what would be predicted (see point 2). My primary issue here is that we are not given RT data for each group, nor is the median splitting procedure described in the methods. Was this across all groups, or within groups? Are short RTs in the yohimbine group any different from short RTs in the other two groups? Unfortunately, we are not given Day 2 picture category memory levels or reaction times for each group. This is relevant because (as given in Supplemental Table S1) memory performance (d´) for the Yohimbine group on Day 1 immediate testing is (roughly speaking) 20% lower than the other 2 groups (independently of whether the pairs will be presented again the following day). I appreciate that this is not significant in a group x performance ANOVA but how does this relate to later memory performance? What were the group-specific RTs on Day 1? So, before the reader goes into the fMRI results, there are questions regarding the supposed drug-induced changes in behavior. Indeed, in the discussion, there is repeated mention of subsequent memory impairment produced by yohimbine but the nature of the impairment is not clear.

      (2) The authors should be clearer as to what their original hypotheses were, and why they did the experiment. Despite being a complex literature, I would have thought the hypotheses would be reconsolidation impairment by cortisol and enhancement by yohimbine. Here it is relevant to point out that - only when the reader gets to the Methods section - there is mention of a paper published by this group in 2024. In this publication, the authors used the same study design but administered a stress manipulation after Day 2 cued recall, instead of a pharmacological one. They did not find a difference in associative hit rate between stress and control groups, but - similar to the current manuscript - reported that post-retrieval stress disrupts subsequent remembering (Day 3 performance) depending on neural memory reinstatement during reactivation (specifically driven by the hippocampus and its correlation with neocortical areas).

      Instead of using these results, and other human studies, to motivate the current work, reference is made to a recent animal study: Line 169 "Building on recent findings in rodents (Khalaf et al. 2018), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval". It is difficult to follow that a rodent study using contextual fear conditioning and examining single neuron activity to remote fear recall and extinction would be relevant enough to motivate a hypothesis for a human psychopharmacological study on emotionally neutral paired associates.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate how noradrenergic and glucocorticoid activity after retrieval influence subsequent memory recall with a 24-hour interval, by using a controlled three-day fMRI study involving pharmacological manipulation. They found that noradrenergic activity after retrieval selectively impairs subsequent memory recall, depending on hippocampal and cortical reactivation during retrieval.

      Overall, there are several significant strengths of this well-written manuscript.

      Strengths:

      (1) The study is methodologically rigorous, employing a well-structured three-day experimental design that includes fMRI imaging, pharmacological interventions, and controlled memory tests.

      (2) The use of pharmacological agents (i.e., hydrocortisone and yohimbine) to manipulate glucocorticoid and noradrenergic activity is a significant strength.

      (3) The clear distinction between online and offline neural reactivation using MVPA and RSA approaches provides valuable insights into how memory dynamics are influenced by noradrenergic and glucocorticoid activity distinctly.

      Weaknesses:

      (1) One potential limitation is the reliance on distinct pharmacodynamics of hydrocortisone and yohimbine, which may complicate the interpretation of the results.

      (2) Another point related above, individual differences in pharmacological responses, physiological and cortisol measures may contribute to memory recall on Day 3.

      (3) Median-splitting approach for reaction times and hippocampal activity should better be justified.

    1. eLife assessment

      This study provides valuable evidence on the relationship between morphine-induced social deficits, corticotropin-releasing factor receptors, and alterations in neuronal activity in the paraventricular nucleus of the hypothalamus of mice (PVN). Convincing approaches and methods were used to show that the CRF1 receptor plays a role in sociability deficits occurring after acute morphine administration. Conclusions regarding mechanistic connections between the effect of modulation of CRF 1 receptor on sociability and PVN neuronal firing are, however, incompletely supported.

    1. eLife assessment

      This study combines experimental and theoretical approaches to examine metabolites at the single-cell level in tea plants. The authors skilfully integrated various tools available for this type of research, and meticulously presented and illustrated every step of the survey. The overall quality of the work is convincing, and it represents an important contribution to our understanding of the compartmentalization of biosynthesis pathways.

    1. eLife assessment

      This useful study provides data suggesting that subcellular localization of the spatial regulator of cell division, MinD, is an intrinsic feature of the protein's ability to associate with the membrane as both a dimer and a monomer. These findings distinguish the behavior of MinD in B. subtilis from its counterpart in E. coli and suggest that there is not a need to invoke additional localization factors. However, all three reviewers agreed that the study is incomplete: experimentally, quantitation and assessment of MinD behavior in the presence of proteins previously implicated in its localization are missing, among other assays, and the molecular modeling necessary to support the authors' conclusion that their data support a reaction-diffusion model is completely absent. Finally, the manuscript itself is difficult to read with an overly long discussion and disorganized introduction and results sections, and it will require significant revision.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the molecular dynamics of MinD, a component of the Bacillus subtilis Min system, in vitro and in vivo. In Escherichia coli the Min system is highly dynamic and displays rapid pole-to-pole oscillation whereby a time average minimum of the Min proteins at mid-cell is established. However, in B. subtilis, this is not the case, and there is no MinE present. MinD in B. subtilis dynamically relocalizes from the poles to division sites and binds to MinC and MinJ, which mediates its interaction with DivIVA. This paper reports the biochemical characterization of B. subtilis MinD in vitro and dynamics of MinD variants in vivo, providing mechanistic insight into the mechanism of dynamic localization.

      Strengths:

      In the current study, the authors perform a detailed biochemical characterizion of the in vitro ATPase activity of MinD and demonstrate that rapid hydrolysis is elicited by adding phospholipids. They further show using a collection of substitution mutants of MinD that both monomers and dimers bind to the membrane, and ATP occupancy changes the on and off rates. Identification, quantification, and tracking of discrete Halo-MinD populations were nicely done and showed that mutations in MinD alter dynamic localization, correlating with PL binding on and off rates in vitro.

      Weaknesses:

      While the study shows that MinD in B. subtilis utilizes a different (MinE-independent) activation mechanism, it remains to be determined the extent to which MinJ and/or MinC play a role.

    3. Reviewer #2 (Public review):

      Summary:

      Feddersen & Bramkamp determined important characteristics of how MinD protein binds/dissociates to/from the membrane, and dimerizes in relation to its ATPase activity. The presented data clearly shows the differences in function of MinD homologs from B. subtilis and E. coli.

      Strengths:

      The work presents well-executed experiments that lead to interesting conclusions and a new model of how Min system works during B. subtilis mid-cell division. Importantly, this model is supported by in vitro characterization of well-chosen mutants in the functional domains of MinD. Outstandingly, most of the in vitro data are confirmed by single-molecule localization microscopy.

      Weaknesses:

      The authors immobilized liposomes, for which they used E. coli total lipids, to measure ATPase activity and liposome association and dissociation of B. subtilis MinD. For these experiments would be more suitable to use B. subtilis total lipids as more biologically relevant data could be gained.

      Although the work is in detail and nicely compares the function of B. subtilis Min system with E. coli Min system, it lacks the comparison of the Min system function in other rod-shaped Gram-positive bacteria. I would suggest including in the Discussion the complexity of other Min systems. Especially, this complexity is seen in other rod-shaped and spore formers such as Clostridial species in which one of these Min systems or both are present, an oscillating E. coli Min system type and more static as in B. subtilis.

    4. Reviewer #3 (Public review):

      Experimentally, this study provides sufficient data to support the authors' conclusion that MinD dimerization but not ATPase activity is both necessary and sufficient for concentrating it and its binding partner, the division inhibitor MinC, at cell poles. Biochemical data appears to be rigorously acquired and includes proper controls. Although cytological data are consistent with the authors' model, quantitative information on MinD localization in a statistically relevant set of cells is missing (e.g. Figure 2B). 

      The study's other major conclusion, as outlined in their discussion, that a reaction-diffusion model explains MinD localization in wild-type cells, is unsubstantiated. If they would like to make this a major conclusion of the final manuscript, they will need to include modeling that takes into account biochemical and cytological data. 

      From a presentation perspective, the manuscript is challenging to read and will require substantial rewriting and revision prior to publication.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      This work combines molecular dynamics (MD) simulations along with experimental elucidation of the efficacy of ATP as biological hydrotrope. While ATP is broadly known as the energy currency, it has also been suggested to modulate the stability of biomolecules and their aggregation propensity. In the computational part of the work, the authors demonstrate that ATP increases the population of the more expanded conformations (higher radius of gyration) in both a soluble folded mini-protein Trp-cage and an intrinsically disordered protein (IDP) Aβ40. Furthermore, ATP is shown to destabilise the pre-formed fibrillar structures using both simulation and experimental data (ThT assay and TEM images). They have also suggested that the biological hydrotrope ATP has significantly higher efficacy as compared to the commonly used chemical hydrotrope sodium xylene sulfonate (NaXS).

      Strengths:

      This work presents a comprehensive and compelling investigation of the effect of ATP on the conformational population of two types of proteins: globular/folded and IDP. The role of ATP as an "aggregate solubilizer" of pre-formed fibrils has been demonstrated using both simulation and experiments. They also elucidate the mechanism of action of ATP as a multi-purpose solubilizer in a protein-specific manner. Depending on the protein, it can interact through electrostatic interactions (for predominantly charged IDPs like Aβ40), or primarily van der Waals' interactions through (for Trp-Cage).

      Weaknesses:

      The weaknesses and suggestions mentioned in my first review have been adequately addressed by the authors in the revised version of the manuscript.

      Thank you very much for your positive feedback and for taking the time to thoroughly review our manuscript. Your thoughtful comments and suggestions have significantly contributed to enhancing the quality of our work.

      We sincerely appreciate your time and efforts in helping us refine our research.

      Reviewer #3 (Public review):

      Since its first experimental report in 2017 (Patel et al. Science 2017), there have been several studies on the phenomenon in which ATP functions as a biological hydrotrope of protein aggregates. In this manuscript, by conducting molecular dynamics simulations of three different proteins, Trp-cage, Abeta40 monomer, and Abeta40 dimer at concentrations of ATP (0.1, 0.5 M), which are higher than those at cellular condition (a few mM), Sarkar et al. find that the amphiphilic nature of ATP, arising from its molecular structure consisting of phosphate group (PG), sugar ring, and aromatic base, enables it to interact with proteins in a protein-specific manner and prevents their aggregation and solubilize if they aggregate. The authors also point out that in comparison with NaXS, which is the traditional chemical hydrotrope, ATP is more efficient in solubilizing protein aggregates because of its amphiphilic nature.

      Trp-cage, featured with hydrophobic core in its native state, is denatured at high ATP concentration. The authors show that the aromatic base group (purine group) of ATP is responsible for inducing the denaturation of helical motif in the native state.

      For Abeta40, which can be classified as an IDP with charged residues, it is shown that ATP disrupts the salt bridge (D23-K28) required for the stability of beta-turn formation.

      By showing that ATP can disassemble preformed protein oligomers (Abeta40 dimer), the authors suggest that ATP is "potent enough to disassemble existing protein droplets, maintaining proper cellular homeostasis," and enhancing solubility.

      Overall, the message of the paper is clear and straightforward to follow. In addition to the previous studies in the literature on this subject. (J. Am. Chem. Soc. 2021, 143, 31, 11982-11993; J. Phys. Chem. B 2022, 126, 42, 8486-8494; J. Phys. Chem. B 2021, 125, 28, 7717-7731; J. Phys. Chem. B 2020, 124, 1, 210-223), the study, which tested using MD simulations whether ATP is a solubilizer of protein aggregates, deserves some attention from the community and is worth publishing.

      Weakness

      My only major concern is that the simulations were performed at unusually high ATP concentrations (100 and 500 mM of ATP), whereas the real cellular concentration of ATP is 1-5 mM.

      I was wondering if there is any report on a titration curve of protein aggregates against ATP, and what is the transition mid-point of ATP-induced solubility of protein aggregates. For instance, urea or GdmCl have long been known as the non-specific denaturants of proteins, and it has been well experimented that their transition mid-points of protein unfolding are in the range of ~(1 - 6) M depending on the proteins.

      The authors responded to my comment on ATP concentration that because of the computational issue in all-atom simulations, they had no option but to employ mM-protein concentrations instead of micromolar concentrations, thus requiring 1000-folds higher ATP concentration, which is at least in accordance with the protein/ATP stoichiometry. However, I believe this is an issue common to all the researchers conducting MD simulations. Even if the system is in the same stoichiometric ratio, it is never clear to me (is it still dilute enough?) whether the mechanism of solubilization of aggregate at 1000 fold higher concentration of ATP remains identical to the actual process.

      Thank you for your thoughtful feedback and for recognizing the value of our study. We appreciate your detailed review and the constructive comments you have provided.

      We appreciate your understanding of the inherent limitations in MD simulations. The use of higher ATP concentrations in our simulations stems from the computational challenges of all-atom MD simulations. Due to the practical constraints of simulating micromolar protein concentrations in atomistic detail, we employed millimolar protein concentrations, which necessitated the use of ATP concentrations that are proportionally higher to maintain appropriate stoichiometry between ATP and proteins.

      We fully agree with your point that this is a common issue faced by researchers in the MD simulation community. While it is challenging to directly replicate physiological ATP concentrations in atomistic simulations, we believe that our approach still captures the fundamental interactions between ATP and proteins. In particular, our focus was on the relative behaviors and mechanistic insights, rather than absolute concentration effects. We based our choice of ATP concentration on maintaining stoichiometric ratios with the protein concentration to ensure that the molecular mechanisms observed remain relevant. We hope our clarification addresses your concerns.

      We would like to share that in an ongoing study focused on the role of ATP in influencing the liquid-liquid phase separation behavior of several intrinsically disordered proteins, we are employing a coarse-grained model. This approach allows us to maintain ATP concentrations within physiologically relevant ranges, as simulating micromolar protein concentrations becomes computationally feasible with this method. We believe that this complementary work will provide additional insights into the behavior of ATP at concentrations more reflective of cellular conditions and further validate the findings from our current study.

      We would also like to emphasize that the complementary experiments presented in this study were conducted at physiologically relevant concentrations for both protein and ATP. The experimental results are in strong agreement with our computational findings, supporting the hypothesis that the mechanisms observed in the simulations closely reflect the actual biological process.

      --—-

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work combines molecular dynamics (MD) simulations along with experimental elucidation of the efficacy of ATP as a biological hydrotrope. While ATP is broadly known as the energy currency, it has also been suggested to modulate the stability of biomolecules and their aggregation propensity. In the computational part of the work, the authors demonstrate that ATP increases the population of the more expanded conformations (higher radius of gyration) in both a soluble folded mini-protein Trp-cage and an intrinsically disordered protein (IDP) Aβ40. Furthermore, ATP is shown to destabilise the pre-formed fibrillar structures using both simulation and experimental data (ThT assay and TEM images). They have also suggested that the biological hydrotrope ATP has significantly higher efficacy as compared to the commonly used chemical hydrotrope sodium xylene sulfonate (NaXS).

      Strengths:

      This work presents a comprehensive and compelling investigation of the effect of ATP on the conformational population of two types of proteins: globular/folded and IDP. The role of ATP as an "aggregate solubilizer" of pre-formed fibrils has been demonstrated using both simulation and experiments. They also elucidate the mechanism of action of ATP as a multi-purpose solubilizer in a protein-specific manner. Depending on the protein, it can interact through electrostatic interactions (for predominantly charged IDPs like Aβ40), or primarily van der Waals' interactions through (for Trp-Cage).

      Weaknesses:

      The data presented by the authors are sound and adequately support the conclusions drawn by the authors. However, there are a few points that could be discussed or elucidated further to broaden the scope of the conclusions drawn in this work as discussed below:

      (i) The concentration of ATP used in the simulations is significantly higher (500 mM) as compared to those used in the experiments (6-20 mM) or cellular cytoplasm (~5 mM as mentioned by the authors). Since the authors mention already known concentration dependence of the effect of ATP, it is worth clarifying the possible limitations and implications of the high ATP concentrations in the simulations.

      We thank the reviewer for their concern regarding the ATP concentration used in our simulation. The reviewer correctly noted our statement about cellular ATP concentrations being in the range of a few millimolar. We would like to highlight that, in a cellular environment, millimolar ATP concentrations coexist with micromolar protein concentrations in the aqueous phase [1].

      In our study, we focused on the impact of ATP on protein conformational dynamics, primarily simulating a protein monomer within the simulation box. If one was required to maintain a micromolar protein concentration (e.g., 20 μM [1]) for a monomeric protein, a MD simulation box of significant dimensions (~44x44x44 nm³) would be required, which is computationally challenging to simulate at an atomistic resolution due to the excessive computational cost and time. We had observed a severe reduction of performance of simulation (with Gromacs software of version 2018.6) of more than 150 times for the 20 μM Aβ40 protein in 20 mM ATP solution containing 50 mM NaCl salt which is comprised in the simulation box of ~ 44x44x44 nm³ in comparison to the current simulation set up we have employed in our study).

      To ensure computational efficiency, we employed a simulation protocol that would maintain the cellular protein/ATP stoichiometry. Similar to the stoichiometry in the cellular environment (i.e., micromolar protein : millimolar ATP ~ 103), our simulations maintained a consistent ratio (i.e., millimolar protein : molar ATP ~ 103). This approach allowed us to use a smaller simulation box while preserving the relevant stoichiometry, enabling us to leverage data within a realistic timeframe.

      Based on the reviewer comment we have included the explanation in the revised manuscript as “In this study, we opted to maintain the ATP stoichiometry consistent with biological conditions and previous in vitro experiments. Instead of keeping the protein concentration within the micromolar range and ATP concentration at the millimolar level, we chose this approach to avoid the need for an extremely large simulation box, which would greatly reduce computational efficiency by more than 150-fold.” (page 4).

      However, during our experimental measurements we have maintained micromolar concentration of protein and ATP concentration in the millimolar range, which lies consistent with the former in vitro experimental studies [1].

      It seems ATP can stabilise the proteins at low concentrations, but the current work does not address this possible effect. It would be interesting to see whether the effect of ATP on globular proteins and IDPs remains similar even at lower ATP concentrations.

      We thank the reviewer for raising this point. We would like to refer you to the Discussion and Conclusion sections of our manuscript (on page 18), where we have noted ATP’s concentration-dependent actions on protein homeostasis, incorporating insights from previous literature as well: “In our literature survey of ATP's concentration-dependent actions, as detailed in the Introduction section, we observed a dual role where ATP induces protein liquid-liquid phase separation at lower concentrations and promotes protein disaggregation at higher concentrations [2–4]. These versatile functions emphasize ATP's pivotal role in maintaining a delicate balance between protein stability (at low ATP concentrations) and solubility (at high ATP concentrations) for effective proteostasis within cells. Notably, ATP-mediated stabilization primarily targets soluble proteins, particularly those with ATP-binding motifs, while ATP-driven biomolecular solubilization is observed for insoluble proteins, typically lacking ATP-binding motifs.”. We explain that ATP stabilizes proteins at lower concentrations, primarily targeting those with ATP-binding motifs, as illustrated by a sequence-dependent analysis. Since the proteins we studied (Trp-cage and Aβ40) do not contain any ATP-binding motifs, ATP-guided protein stabilization is not expected for these proteins. Additionally, we presented a set of simulations for Trp-cage with a comparatively lower concentration of ATP (see Figure 2), which also suggests

      ATP-driven protein chain elongation. Thus, we believe that ATP’s effect on globular proteins and intrinsically disordered proteins (IDPs) lacking ATP-binding motifs would remain similar at lower ATP concentrations.”

      (ii) The authors make a somewhat ambitious statement that the role of ATP as a solubilizer of pre-formed fibrils could be used as a therapeutic strategy in protein aggregation-related diseases. However, it is not clear how it would be so since ATP is a promiscuous substrate in several biochemical processes and any additional administration of ATP beyond normal cellular concentration (~5 mM) could be detrimental.

      The authors thank the reviewer for this comment. In conjunction with earlier studies on the non-energetic effects of ATP, our study underscores ATP’s anti-aggregation properties and its ability to dissolve preformed aggregates, thereby maintaining regular protein homeostasis within cells and inhibiting protein aggregation-related diseases. Consequently, ATP has been proposed as a probable therapeutic agent in multiple previous reports [5–8]. Patel et al. also noted that as ATP levels decrease with age, this can lead to increased protein aggregation and neurodegenerative decline [1]. Therefore, the problem of excessive protein aggregation in cells may be linked to the reduction of ATP levels with aging [1,8–12]. In such circumstances, authors hypothesize introducing ATP as part of a therapeutic treatment might address the issue of excessive protein aggregation and neurodegenerative diseases.

      (iii) A natural question arises about what is so special about ATP as a solubilizer. The authors have also asked this question but in a limited scope of comparing to a commonly used chemical hydrotrope NaXS. However, a bigger question would be what kind of chemical/physical features make ATP special? For example, (i) if the amphiphilic property is important, what about some standard surfactants? (ii) how would ATP compare to other nucleotides like ADP or GTP? It might be useful to explore such questions in the future to further establish the special role of ATP in this regard.

      We thank the reviewer for recognizing the significance and value of our exploration into the unique properties of ATP as a solubilizer. In response to the reviewer’s comment regarding the specific features that make ATP special, we would like to emphasize our analysis of ATP's region-specific interactions with biomolecules. ATP's unique structure, comprising three distinct moieties- a larger hydrophobic aromatic base, a hydrophilic sugar moiety, and a highly negatively charged phosphate group, enables it to perform multiple modes of interactions, including hydrophobic, hydrogen bonding, and electrostatic interactions with proteins. This combination of interactions leads to its pronounced effect in a protein-specific manner. We believe that, together with its amphiphilic property, the specific chemical structure of ATP makes it an efficient solubilizer. A previous study by Patel et al. demonstrated the efficiency of ATP as a biological hydrotrope compared to other classical chemical hydrotropes (NaXS and NaTO). Our current study further rationalizes ATP’s efficiency through its effective interactions with biomolecules, driven by the chemically distinct parts of the ATP molecule.

      Regarding the reviewer’s point about comparing ATP as a hydrotrope with standard surfactants, we would like to add that typically, hydrotropes are amphiphilic molecules that differ from classical surfactants due to their low cooperativity of aggregation and their effectiveness at molar concentrations. Hydrotropes tend to preferentially accumulate non stoichiometrically around the solute, and their aggregation depends on the presence of solute molecules. Unlike surfactants, hydrotropes do not form any well-defined superstructure on their own.

      In response to the reviewer’s comment on comparing ATP’s effect with other nucleotides like ADP and GTP, we would like to highlight that previous studies have shown GTP to dissolve protein droplets (FUS) with similar efficiency to ATP. However, in cells, the concentration of GTP is much lower than that of ATP, resulting in negligible effects on the solubilization of liquid compartments in vivo. Conversely, ADP and AMP exhibited comparatively lower efficiency in dissolving protein condensates, suggesting the triphosphate moiety plays a considerable role in protein condensate dissolution. Additionally, only TP-Mg had a negligible effect on protein drop dissolution, indicating that the charge density in the ionic ATP side chain alone is insufficient for dissolving protein drops. Together, these findings highlight the efficiency of ATP as a protein aggregate solubilizer, which stems from its specific chemical structure and not merely its amphiphilicity.

      According to the suggestion of the reviewer we have included the discussion in the revised manuscript as “Comparing the effects of ATP with other nucleotides such as ADP and GTP, we emphasize that previous studies have demonstrated GTP can dissolve protein droplets (such as FUS) with efficiency comparable to ATP. However, in vivo, the concentration of GTP is significantly lower than that of ATP, resulting in negligible impact on the solubilization of liquid compartments. In contrast, ADP and AMP show much lower efficiency in dissolving protein condensates, indicating the critical role of the triphosphate moiety in protein condensate dissolution. Furthermore, only TP-Mg exhibited a negligible effect on protein droplet dissolution, suggesting that the charge density in the ionic ATP side chain alone is insufficient for this process. These findings underscore ATP's superior efficacy as a protein aggregate solubilizer, attributed to its specific chemical structure rather than merely its amphiphilicity.” (page 15).

      (iv) In Figure 2F, it seems that in the presence of 0.5 M ATP, the Rg increases (as expected), but the number of native contacts remains almost similar. The reduction in the number of native contacts at higher ATP concentrations is not as dramatic as the increase in Rg. This is somewhat counterintuitive and should be looked into. Normally one would expect a monotonous reduction in the number of native contacts as the protein unfolds (increase in Rg).

      We appreciate the reviewer’s insightful comment. As noted, the presence of 0.5 M ATP results in an increase in the protein’s radius of gyration (Rg) and a decrease in native contacts, indicating that ATP promotes protein chain extension. However, the extent of the changes in Rg and native contacts are not identical. It is important to recognize that even the disruption of a few native contacts can significantly impact protein folding, leading to considerable protein chain extension. Therefore, it is not necessary for the extent of variation in Rg and native contacts to be similar. The appropriate measure is whether the alterations in these two variables are consistent with each other, such that an increase in Rg is accompanied by a decrease in native contacts, and vice versa.

      Reviewer #1 (Recommendations For The Authors):

      (i) There are several references repeated multiple times, e.g. (a) 1, 9, 14, (b) 25, 29, 31, 33. There are more such examples and the authors should fix these.

      We thank the reviewer for pointing this out. We have addressed the issue in the updated manuscript.

      (ii) Specific Gromacs version should be mentioned rather than 20xx.

      In the updated manuscript we have mentioned the particular version of Gromacs software (2018.6) we have employed for our simulation.

      Reviewer #2 (Public Review):

      In this work, Sarkar et al. investigated the potential ability of adenosine triphosphate (ATP) as a solubilizer of protein aggregates by combining MD simulations and ThT/TEM experiments. They explored how ATP influences the conformational behaviors of Trp-cage and β-amyloid Aβ40 proteins. Currently, there are no experiments in the literature supporting their simulation results of ATP on Trp-cage. The simulation protocol employed for the Aβ40 monomer system is conventional MD simulation, while REMD simulation (an enhanced sampling method) is used for the Aβ monomer + ATP system. It is not clear whether the conformational difference is caused by ATP or by the different simulation methods used.

      We thank the reviewer for raising this point. First we note that for Trp-cage, the simulation methods employed in presence and absence of ATP were identical (REMD simulation) and the difference in the free energy surfaces due to introduction of ATP in the solution were evident.

      Nonetheless to address referee’s point if the difference in simulation method employed for generating the 2D free energy landscape in absence and presence of ATP would have introduced the observed difference, we had undertaken the initiative of carrying out a fresh set of REMD simulations with Aβ40 in neat water, followed by adaptive sampling simulation. As shown below in Author response image 1, the free energy profiles obtained from conventional MD simulation (using DESRES trajectory) as well as those obtained via REMD simulations for the same system (in neat water) are qualitatively similar. The free energy profiles obtained in presence of ATP are significantly different from that of neat water, irrespective of the simulation method. This confirms the simulation’s observation of ATP driven alteration of protein conformation.

      Author response image 1.

      Image represents the 2D free energy profile for Aβ40 monomer in absence of ATP, obtained through A. conventional MD and B. REMD simulation followed by adaptive sampling simulation.

      In the revised manuscript we have included the discussion as “To verify that the effect of ATP on conformational landscape is not an artifact of difference in sampling method (long conventional MD in absence of ATP versus REMD in presence of ATP), we repeated the conformational sampling in absence of ATP via employing REMD, augmented by adaptive sampling (figure S4). We find that the free energy map remains qualitatively similar (figure 4A and S4) irrespective the sampling technique. Comparison of 2D free energy map obtained from REMD simulation in absence of ATP (figure S4) with the one obtained in presence of ATP (figure 4B) also indicates ATP driven protein chain elongation.” on page 7 and updated the method section as “To test the robustness we have also estimated the 2D free energy profile of Aβ40 in absence of ATP by performing a similar REMD simulation followed by adaptive sampling simulation following the similar protocol described above.” on page 20.

      ThT/TEM experiments should be performed on Aβ40 fibrils rather than on Aβ(16-22) aggregates. Moreover, to elucidate their experimental results that ATP can dissolve preformed Aβ fibrils, the authors need to study the influence of ATP on Aβ fibrils instead of on Aβ dimer in their MD simulations. The novelty of this study is limited. The role of ATP in inhibiting Aβ fibril formation and dissolving preformed Aβ fibrils has been reported in previous experimental and computational studies (Journal of Alzheimer's Disease, 2014, 41: 561; Science 2017, 2017, 356, 753-756 J. Phys. Chem. B 2019, 123, 9922−9933; Scientific Reports, 2024, 14: 8134). However, most of those papers are not discussed in this manuscript. Additionally, some details of MD simulations and data analysis are missing in the manuscript, including the initial structures of all the simulations, the method for free energy calculation, the dielectric constant used, etc.

      We thank the reviewer for pointing out additional papers on ATP that were not discussed in the original manuscript. While some of the suggested papers were already cited (Science 2017, 356, 753-756), we had initially excluded the others as we did not find them directly relevant to our focus. However, in this revised version, we have included those references (on page 17 and 18).

      Through a thorough literature review, including the papers suggested by the reviewer, we maintain that our article is novel in its investigation of ATP's role in the protein conformational landscape and its correlation with anti-aggregation effects. While previous reports emphasize ATP's role in inhibiting protein aggregation, our work connects these findings by highlighting ATP's influence starting at the monomeric level, thereby preventing proteins from becoming aggregation-prone.

      In the revised manuscript, we have included this justification as “While previous reports emphasize ATP's role in inhibiting protein aggregation, our work connects these findings by highlighting ATP's influence starting at the monomeric level, thereby preventing proteins from becoming aggregation-prone.” on page 18.

      Regarding the reviewer's concern on the details of MD simulations, we would like to mention that method part of the current article provides an elaborate explanation of the simulation set up and characterization (on page 19-21). Regarding the reviewer's comment on dielectric constant, we would like to emphasize that here we have performed simulation considering explicit presence of solvent (water molecules), which by default takes into account dielectric constants (unlike many approximate continuum modelling approaches).

      Reviewer #2 (Recommendations For The Authors):

      (1) The convergence of simulations needs to be verified prior to data analysis.

      We thank the reviewer for this suggestion. We have assessed the convergence of the simulations and represented the respective plots in Author response image 2.

      Author response image 2.

      The time profile of temperature (a, c, e and g) and energies i.e. kinetic energy, potential energy and total energy (b, d, f and h) are being represented for Trp-cage in absence (a-b) and presence of 0.5 MATP (c-d) and Aβ40 protein in absence (e-f) and presence of 0.5 M ATP (g-h).

      (2) "The precedent experiments investigating protein aggregation in the presence of ATP, had been performed by maintaining the ATP:protein stoichiometric ratio in the range of 0.1x10x3 to 1.6x10x3. Likewise, in our simulation with Trp-cage, the ATP:protein ratio of 0.02x10x3 was maintained.". Clearly, there is a big difference between the ATP:protein ratio in the MD simulations and that in the precedent experiments.

      We thank the reviewer for raising this point. We would like to clarify that for unstructured proteins, including Aβ40, the ATP stoichiometry [1] ranged from 0.1 × 10³ to 1.6 × 10³. In our study, we have maintained the ATP stoichiometry at 0.1 × 10³ for the disordered protein Aβ40. For structured globular mini-protein like Trp-cage, a lower concentration of 0.02 × 10³ was used, consistent with other studies investigating the effects of ATP on globular proteins such as ubiquitin, lysozyme, and malate dehydrogenase, where the ATP stoichiometry ranged [13] from 0.01 × 10³ to 0.03 × 10³.

      In the revised manuscript we have clearly mentioned the point as “The precedent studies reporting the effect of ATP on structured proteins, had been performed by maintaining ATP:protein stoichiometric ratio in the range of 0.01x103 to 0.03x103. Likewise, in our simulation with Trp-cage, the ATP:protein ratio of 0.02x103 was maintained. ” in page 4 and “The former experiments investigating protein (unstructured) aggregation in presence of ATP, had been performed by maintaining ATP:protein stoichiometric ratio in the range of 0.1x103 to 1.6x103, similarly we have also maintained ATP/protein stoichiometry 0.1x103 in our investigation ATP’s effect on disordered protein Aβ40.” in page 7.

      However, during our experimental measurements we have maintained micromolar concentration of protein and ATP concentration in the millimolar range, which lies consistent with the former in vitro experimental studies [1].

      (3) The snapshots in Figure 2G show that in the absence of ATP, the Trp-cage monomer exhibits only minor conformational changes compared to the NMR structure (PDB: 1L2Y). However, the native contact number of the Trp-cage monomer (~18, Figure 2C) is much smaller than the total contact number (~160, Figure 2B). The authors are suggested to explain this unexpectedly large difference.

      The authors thank the reviewer for his/her concern related to the values of native contact and the total number of contacts of the protein Trp-cage. The author would like to highlight that the estimation of total number of contacts involves the cumulative number of intra-protein contacts which calculates when the two atoms of the protein’s come within the cut-off distance (0.8 nm). Whereas native contact only considers the key contacts of the protein between the side chains of two amino acids that are not adjacent in the amino acid sequence.

      (4) The authors are suggested to calculate the contact numbers of each residue with different parts of ATP (phosphate group, base, and sugar moiety), which will help to reveal the key interactions between ATP and proteins.

      The authors thank the reviewer for this comment. According to the suggestion we have calculated the contact probability of each residue of protein with ATP as depicted in Author response image 3 and 4 for Trp-cage and Aβ40 respectively.

      Author response image 3.

      The figure shows the residue wise contact probability of protein Trp-cage with ATP.

      Author response image 4.

      The image shows the residue wise contact probability of Aβ40 protein with ATP.

      For detailed interaction of ATP’s region-specific interactions with proteins, the authors would like to refer to the calculation of the preferential binding coefficient and interaction energies as depicted in Figure 3 for Trp-cage (in page 6) and in Figure 5 and 8 for Aβ40 protein. These figures illustrate well the mode of protein interaction with the chemically divergent regions of ATP and also illuminates ATP’s interaction with different parts of the proteins as well.

      (5) The authors claimed that "coulombic interaction of ATP with protein predominates in Aβ40 (Figure 5 H)" (Page 10). However, the preferential interaction coefficient in Figure 5G shows that the curve of the phosphate group lies below the other two curves when distance < 1 nm, indicating the relatively weak interactions between the phosphate group and Aβ40. This seems to be in conflict with the results of energy calculation (Figure 5H).

      We thank the reviewer for raising this point. The author would like to emphasize that ATP, with its large and highly charged phosphate group, is highly likely to interact with intrinsically disordered proteins (IDPs) primarily through electrostatic interactions due to their significant charge content. In Figure 5G, it is evident that the preferential binding coefficient reaches a notably high value, indicating strong interaction between the protein and the charged phosphate group of ATP. To address the reviewer's concern regarding the curve showing the highest interaction value only after 1 nm, we would like to highlight the nature of long-range electrostatic potential, which is active in the range of approximately 1-1.2 nm [14–16]. Furthermore, Figure 5H confirms that the electrostatic interaction between the protein and ATP is favorable and predominates over the Lennard-Jones (LJ) interaction.

      (6) There are several issues with citations. For example, references 2, 5, 24, 28, 32, 45. 49 and 53 are the same paper, references 1, 7, and 14 are the same paper, references 12, 15, and 46 are the same paper, and many more. In addition, the title of reference 12/15 is "ATP Controls the Aggregation of Aβ16-22 Peptides" instead of "ATP Controls the Aggregation of Aβ Peptides".

      We thank the reviewer for pointing this out. We have addressed the issue in the updated manuscript.

      (7) References 19 and 20 are cited in the context of "As a potential function of the excess ATP concentration within the cell, a substantial influence on cellular protein homeostasis is observed, particularly in preventing protein aggregation (14-21)" (Page 2). However, there is no mention of "ATP" in ref. 19 and 20.

      Thank you to the reviewer for identifying this mistake. We have corrected the issue in the revised manuscript.

      (8) On page 22: "To perform all the molecular dynamics (MD) simulations GROMACS software of version 20xx software was utilized". Please provide the version of GROMACS software used in this study.

      In the updated manuscript, we have specified the particular version of Gromacs software (2018.6) used for our simulations. (see revised manuscript page 19)

      (9) In Figure 8J, the time-dependent distance of Aβ40 dimer without ATP needs to be provided as a comparison.

      We thank the reviewer for this comment. In the revised manuscript we have updated the calculation of distance between the Aβ40 protein chains both in absence and presence of ATP as well as “The probability distribution (Figure 8J) illustrates that, in the presence of ATP, the two protein chains, initially part of the dimer, become prone to be moved away from each other.” (page 15).

      (10) The authors should compare ATP-Aβ interactions with NaXS-Aβ interactions to understand why ATP is more efficient than NaXS in inhibiting interprotein interactions.

      The authors thank the reviewer for the concern regarding the ATP-Aβ40 interaction compared to the NaXS-Aβ40 interaction. We would like to highlight our results (Figure 5G and H) which demonstrate the dominance of Coulombic interactions (over LJ interactions) of ATP with the protein. Based on this, we compared the Coulombic interaction energy of ATP and NaXS with the protein Aβ40, as depicted in Figure 9I. We observed that ATP-protein electrostatic interactions occur more favorably than those with NaXS, leading to better action of ATP over NaXS. The favorable electrostatic interaction of ATP with the protein, compared to NaXS, is evident because ATP possesses a large and highly charged triphosphate group that can strongly interact with the protein, whereas NaXS contains a very small sulfonate group with much less charge. Therefore, due to the favorable Coulombic interaction of ATP with the protein over NaXS, ATP acts more efficiently as a hydrotrope. In the revised manuscript we have highlighted the term “Coulombic interaction” in the main text and in the figure caption (Figure 9) as well (in page 15 and 16 of the revised manuscript respectively).

      (11) The word "sollubilizer" in the Abstract is a typo.

      We thank the reviewer for pointing this out. We have made the necessary corrections in the revised manuscript.

      (12) What does "ATP-Mg2+" mean in the manuscript?

      ATP, being polyanionic and possessing a potentially chelating polyphosphate group, binds metal cations with high affinity and hence biologically it occurs to be complexed with the equivalent number of Mg2+ in the form of ATP-Mg [17–19]. Similarly multiple former studies utilized ATP-Mg in their investigations [1,20–22].

      Reviewer #3 (Public Review):

      Summary:

      Since its first experimental report in 2017 (Patel et al. Science 2017), there have been several studies on the phenomenon in which ATP functions as a biological hydrotrope of protein aggregates. In this manuscript, by conducting molecular dynamics simulations of three different proteins, Trp-cage, Abeta40 monomer, and Abeta40 dimer at a high concentration of ATP (0.1, 0.5 M), Sarkar et al. find that the amphiphilic nature of ATP, arising from its molecular structure consisting of phosphate group (PG), sugar ring, and aromatic base, enables it to interact with proteins in a protein-specific manner and prevents their aggregation and solubilize if they aggregate. The authors also point out that in comparison with NaXS, which is the traditional chemical hydrotrope, ATP is more efficient in solubilizing protein aggregates because of its amphiphilic nature.

      Trp-cage, featured with a hydrophobic core in its native state, is denatured at high ATP concentration. The authors show that the aromatic base group (purine group) of ATP is responsible for inducing the denaturation of helical motifs in the native state.

      For Abeta40, which can be classified as an IDP with charged residues, it is shown that ATP disrupts the salt bridge (D23-K28) required for the stability of beta-turn formation.

      By showing that ATP can disassemble preformed protein oligomers (Abeta40 dimer), the authors argue that ATP is "potent enough to disassemble existing protein droplets, maintaining proper cellular homeostasis," and enhancing solubility.

      Overall, the message of the paper is clear and straightforward to follow. I did not follow all the literature, but I see in the literature search, that there are several studies on this subject. (J. Am. Chem. Soc. 2021, 143, 31, 11982-11993; J. Phys. Chem. B 2022, 126, 42, 8486-8494; J. Phys. Chem. B 2021, 125, 28, 7717-7731; J. Phys. Chem. B 2020, 124, 1, 210-223).

      If this study is indeed the first one to test using MD simulations whether ATP is a solubilizer of protein aggregates, it may deserve some attention from the community. But, the authors should definitely discuss the content of existing studies, and make it explicit what is new in this study.

      Strengths:

      The authors showed that due to its amphiphilic nature, ATP can interact with different proteins in a protein-specific manner, a. finding more general and specific than merely calling ATP a biological hydrotrope.

      Weaknesses:

      (1) My only major concern is that the simulations were performed at unusually high ATP concentrations (100 and 500 mM of ATP), whereas the real cellular concentration of ATP is 1-5 mM. Even if ATP is a good solubilizer of protein aggregates, the actual concentration should matter. I was wondering if there is a previous report on a titration curve of protein aggregates against ATP, and what is the transition mid-point of ATP-induced solubility of protein aggregates.

      For instance, urea or GdmCl have long been known as the non-specific denaturants of proteins, and it has been well experimented that their transition mid-point of protein unfolding is ~(1 - 6) M depending on the proteins.

      We thank the reviewer for their concern regarding the ATP concentration used in our simulation. The reviewer correctly noted our statement about cellular ATP concentrations being in the range of a few millimolar. We would like to highlight that, in a cellular environment, millimolar ATP concentrations coexist with micromolar protein concentrations in the aqueous phase.

      In our study, we focused on the impact of ATP on protein conformational dynamics, primarily simulating a protein monomer within the simulation box. To maintain a micromolar protein concentration (e.g., 20 μM [1]) for a monomeric protein, a simulation box of significant dimensions (~44x44x44 nm³) would be required. This size would be computationally challenging to simulate at an atomistic resolution due to the excessive computational cost and time.

      To ensure computational efficiency, we employed millimolar protein concentrations instead of micromolar, thus requiring a higher ATP concentration to maintain the cellular protein stoichiometry. Similar to the stoichiometry in the cellular environment (i.e., micromolar protein : millimolar ATP ~ 103), our simulations maintained a consistent ratio (i.e., millimolar protein : molar ATP ~ 103). This approach allowed us to use a smaller simulation box while preserving the relevant stoichiometry, enabling us to leverage data within a realistic timeframe.

      Based on the reviewer comment we have included the explanation in the revised manuscript as “In this study, we opted to maintain the ATP stoichiometry consistent with biological conditions and previous in vitro experiments. Instead of keeping the protein concentration within the micromolar range and ATP concentration at the millimolar level, we chose this approach to avoid the need for an extremely large simulation box, which would greatly reduce computational efficiency by more than 150-fold.” (page 4).

      However, during our experimental measurements we have maintained micromolar concentration of protein and ATP concentration in the millimolar range, which lies consistent with the former in vitro experimental studies [1]

      (2) The sentence "... a clear shift of relative population of Abeta40 conformational subensemble towards a basin with higher Rg and lower number of contacts in the presence of ATP" is not a precise description of Figures 4A and 4B. It is not clear from the figures whether the Rg of Abeta40 is increased when Abeta40 is subject to ATP. The authors should give a more precise description of what is observed in the result from their simulations or consider a better-order parameter to describe the change in molecular structure.

      We thank the reviewer for this comment. Figure 4A and 4B depicting the 2D free energy profile of the Aβ40 protein with respect to Rg and total number contacts are presented to pinpoint the alteration of protein conformational landscape in influence of ATP. To further elucidate ATP driven protein conformational alteration, the overlaid snapshots corresponding to absence and presence of ATP were also provided. Together the author believes that the descriptions of Figures 4A and 4B in the article are appropriate and effectively incorporate the analysis provided in the article.

      In addition, the disruption of beta-sheet from Figure 4E to 4F is not very clear. The authors may want to use an arrow to indicate the region of the contact map associated with this change.

      In the revised manuscript the authors have highlighted the region of the contact map associated with the changes in the beta-sheet propensity with an arrow for each of the plots.

      Although the full atomistic simulations were carried out, the analyses demonstrated in this study are a bit rudimentary and coarse-grained (e.g, Rg is a rather poor order parameter to discuss dynamics involved in proteins). The authors could go beyond and say more about how ATP interacts with proteins and disrupts the stable configurations.

      We thank the reviewer for this comment. We understand the reviewer's concern regarding the choice of the order parameter (Rg), which has been a topic of long-standing debate. However, we would like to note that in the current study, we employed Rg based on recent investigations by Dr. D. E. Shaw Research group [23] (specifically concerning the protein Aβ40 and the Charmm36m force field), which reported an almost negligible Rg penalty compared to experimental values. The experiments characterizing IDPs utilize Rg as a choice of metric. We also would like to highlight that previous investigations of our group have done careful benchmarking of several features of proteins as well as IDPs using both linear and artificial neural network based dimension reduction techniques and have demonstrated that Rg, in combination with fraction of native contact serves as optimum features [24,25]. Therefore, we believed that Rg would be a suitable order parameter for analyzing the structural behavior of this protein. Additionally, we have also analyzed other relevant characteristics, including the total number of contacts, residue-wise protein contact map, percentage of secondary structure, solvent-accessible surface area, and distances between key interacting residues, to provide a comprehensive understanding.

      The justification of our choice of collective variable has been discussed in the revised manuscript as “Since multiple previous studies has reported benchmarking of several features of proteins as well as IDPs using both linear and artificial neural network based dimension reduction techniques and have demonstrated that Rg, in combination with fraction of native contact serves as optimum features, we have chosen these two metrics for developing the 2D free energy profile.” on page 4.

      (3) Although the amphiphilic character of ATP is highlighted, a similar comment can be made as to GTP. Is GTP, whose cellular concentration is ~0.5 mM, also a good solubilizer of protein aggregates? If not, why? Please comment.

      In response to the reviewer’s comment on comparing ATP’s effect with other nucleotides GTP, we would like to highlight that previous studies have shown GTP’s ability to dissolve protein droplets (FUS) with similar efficiency to ATP [1,26]. However, in cells, the concentration of GTP is much lower than that of ATP, resulting in negligible effects on the solubilization of liquid compartments in vivo [1].

      According to the suggestion of the reviewer we have included the discussion in the revised manuscript as “Comparing the effects of ATP with other nucleotides such as ADP and GTP, we emphasize that previous studies have demonstrated GTP can dissolve protein droplets (such as FUS) with efficiency comparable to ATP. However, in vivo, the concentration of GTP is significantly lower than that of ATP, resulting in negligible impact on the solubilization of liquid compartments. In contrast, ADP and AMP show much lower efficiency in dissolving protein condensates, indicating the critical role of the triphosphate moiety in protein condensate dissolution. Furthermore, only TP-Mg exhibited a negligible effect on protein droplet dissolution, suggesting that the charge density in the ionic ATP side chain alone is insufficient for this process. These findings underscore ATP's superior efficacy as a protein aggregate solubilizer, attributed to its specific chemical structure rather than merely its amphiphilicity.” (page 15).

      Reviewer #3 (Recommendations For The Authors):

      Spell-check should be carried out throughout the manuscript. e.g., sollubilizer, sollubilizing, ...

      We thank the reviewer for pointing this out. We have made the necessary corrections in the revised manuscript.

      The reference section should be properly organized. There are multiple repetitions of references (e.g., references 28, 30, 32 are the same reference). I see many instances of this.

      We thank the reviewer for pointing this out. We have addressed the issue in the updated manuscript.

      References:

      (1) Patel, A.; Malinovska, L.; Saha, S.; Wang, J.; Alberti, S.; Krishnan, Y.; Hyman, A. A. ATP as a Biological Hydrotrope. Science 2017, 356 (6339), 753–756.

      (2) Ren, C.-L.; Shan, Y.; Zhang, P.; Ding, H.-M.; Ma, Y.-Q. Uncovering the Molecular Mechanism for Dual Effect of ATP on Phase Separation in FUS Solution. Sci Adv 2022, 8 (37), eabo7885.

      (3) Song, J. Adenosine Triphosphate Energy-Independently Controls Protein Homeostasis with Unique Structure and Diverse Mechanisms. Protein Sci. 2021, 30 (7), 1277–1293.

      (4) Liu, F.; Wang, J. ATP Acts as a Hydrotrope to Regulate the Phase Separation of NBDY Clusters. JACS Au 2023, 3 (9), 2578–2585.

      (5) Chu, X.-Y.; Xu, Y.-Y.; Tong, X.-Y.; Wang, G.; Zhang, H.-Y. The Legend of ATP: From Origin of Life to Precision Medicine. Metabolites 2022, 12 (5). https://doi.org/10.3390/metabo12050461.

      (6) Tian, Z.; Qian, F. Adenosine Triphosphate-Induced Rapid Liquid-Liquid Phase Separation of a Model IgG1 mAb. Mol. Pharm. 2021, 18 (1), 267–274.

      (7) Wang, B.; Zhang, L.; Dai, T.; Qin, Z.; Lu, H.; Zhang, L.; Zhou, F. Liquid-Liquid Phase Separation in Human Health and Diseases. Signal Transduct Target Ther 2021, 6 (1), 290.

      (8) Alberti, S.; Dormann, D. Liquid-Liquid Phase Separation in Disease. Annu. Rev. Genet. 2019, 53, 171–194.

      (9) Nair, K. S. Aging Muscle. Am. J. Clin. Nutr. 2005, 81 (5), 953–963.

      (10) Recharging Mitochondrial Batteries in Old Eyes. Near Infra-Red Increases ATP. Exp. Eye Res. 2014, 122, 50–53.

      (11) Goldberg, J.; Currais, A.; Prior, M.; Fischer, W.; Chiruta, C.; Ratliff, E.; Daugherty, D.; Dargusch, R.; Finley, K.; Esparza-Moltó, P. B.; Cuezva, J. M.; Maher, P.; Petrascheck, M.; Schubert, D. The Mitochondrial ATP Synthase Is a Shared Drug Target for Aging and Dementia. Aging Cell 2018, 17 (2). https://doi.org/10.1111/acel.12715.

      (12) Kagawa, Y.; Hamamoto, T.; Endo, H.; Ichida, M.; Shibui, H.; Hayakawa, M. Genes of Human ATP Synthase: Their Roles in Physiology and Aging. Biosci. Rep. 1997, 17 (2), 115–146.

      (13) Ou, X.; Lao, Y.; Xu, J.; Wutthinitikornkit, Y.; Shi, R.; Chen, X.; Li, J. ATP Can Efficiently Stabilize Protein through a Unique Mechanism. JACS Au 2021, 1 (10), 1766–1777.

      (14) Norberg, J.; Nilsson, L. On the Truncation of Long-Range Electrostatic Interactions in DNA. Biophys. J. 2000, 79 (3), 1537–1553.

      (15) Pabbathi, A.; Coleman, L.; Godar, S.; Paul, A.; Garlapati, A.; Spencer, M.; Eller, J.; Alper, J. D. Long-Range Electrostatic Interactions Significantly Modulate the Affinity of Dynein for Microtubules. Biophys. J. 2022, 121 (9), 1715–1726.

      (16) Sastry, M. Nanoparticle Thin Films: An Approach Based on Self-Assembly. In Handbook of Surfaces and Interfaces of Materials; Elsevier, 2001; pp 87–123.

      (17) Wilson, J. E.; Chin, A. Chelation of Divalent Cations by ATP, Studied by Titration Calorimetry. Anal. Biochem. 1991, 193 (1), 16–19.

      (18) Storer, A. C.; Cornish-Bowden, A. Concentration of MgATP2- and Other Ions in Solution. Calculation of the True Concentrations of Species Present in Mixtures of Associating Ions. Biochem. J 1976, 159 (1), 1–5.

      (19) Garfinkel, L.; Altschuld, R. A.; Garfinkel, D. Magnesium in Cardiac Energy Metabolism. J. Mol. Cell. Cardiol. 1986, 18 (10), 1003–1013.

      (20) Hautke, A.; Ebbinghaus, S. The Emerging Role of ATP as a Cosolute for Biomolecular Processes. Biol. Chem. 2023, 404 (10), 897–908.

      (21) Pal, S.; Roy, R.; Paul, S. Deciphering the Role of ATP on PHF6 Aggregation. J. Phys. Chem. B 2022, 126 (26), 4761–4775.

      (22) Pal, S.; Paul, S. ATP Controls the Aggregation of Aβ Peptides. J. Phys. Chem. B 2020, 124(1), 210–223.

      (23) Robustelli, P.; Piana, S.; Shaw, D. E. Developing a Molecular Dynamics Force Field for Both Folded and Disordered Protein States. Proc. Natl. Acad. Sci. U. S. A. 2018, 115 (21), E4758–E4766.

      (24) Ahalawat, N.; Mondal, J. Assessment and Optimization of Collective Variables for Protein Conformational Landscape: GB1 -Hairpin as a Case Study. J. Chem. Phys. 2018, 149 (9), 094101.

      (25) Menon, S.; Adhikari, S.; Mondal, J. An Integrated Machine Learning Approach Delineates Entropy-Mediated Conformational Modulation of α-Synuclein by Small Molecule, 2024. https://doi.org/10.7554/elife.97709.1.

      (26) Pandey, M. P.; Sasidharan, S.; Raghunathan, V. A.; Khandelia, H. Molecular Mechanism of Hydrotropic Properties of GTP and ATP. J. Phys. Chem. B 2022, 126 (42), 8486–8494.

    1. Author response:

      We thank the reviewers for their productive comments on our work. While we have chosen to not revise the manuscript further, we reply to the public reviewer comments here so as to provide clarification on certain points.

      Reviewer #1 (Public Review):

      Summary:

      The aim of the study described in this paper was to test whether visual stimuli that pulse synchronously with the systole phase of the cardiac cycle are suppressed compared with stimuli that pulse in the diastole phase. To this end, the authors employed a binocular rivalry task and used the duration of the perceived image as the metric of interest. The authors predicted that if there was global suppression of the visual stimulus during systole then the durations of the stimulus that were pulsing synchronously with systole should be of shorter duration than those pulsing in diastole. However, the results observed were the opposite of those predicted. The authors speculate on what this facilitation effect might mean for the baroreceptor suppression hypothesis.

      Strengths:

      This is an interesting and timely study that uses a clever paradigm to test the baroreceptor suppression hypothesis in vision. This is a refreshingly focussed paper with interesting and seemingly counterintuitive results.

      Weaknesses:

      The paper could benefit from a clearer explanation of the predicted results. For those not experts in binocular rivalry, it would be useful to explain the predicted results. Does pulsing stimuli in this way change durations in such a task? If there is global suppression of visual stimuli why would this lead to shorter/longer durations in the systole compared to the diastole conditions? In addition, the duration lengths in both conditions seem to be longer than one cardiac cycle. If the cardiac cycle modulates duration it would be interesting to discuss why this occurs on some cycles but not on others. If there is a facilitation effect why does it only occur on some cycles?

      In general, pulsing stimuli (i.e. moving gratings) show longer dominance durations when in competition with non-pulsing stimuli; in other words, pulses increase the “stimulus strength” of a visual grating (Wade, De Weert & Swanston, 1984). The Baroreceptor Hypothesis predicts global suppression of visual cortex during systole (and not during diastole), so the stimulus strength boost yielded by a pulse should be attenuated during systole. Thus, the stimulus that only pulses during systole would have lower stimulus strength (and thus shorter dominance durations) than that which pulses during diastole; however, we observe the opposite pattern in our data, seemingly contradicting the Baroreceptor Hypothesis.

      In typical binocular rivalry paradigms, dominance durations are biased by stimulus strength, but perception remains bistable such that the stronger stimulus is not necessarily dominant at a given time. We see no reason, then, why switching would have to occur every cycle. The dominance durations we see are quite typical of binocular rivalry paradigms, whereas durations shorter than a cardiac cycle would be rather unusual (Carmel et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      This is a binocular rivalry study that uses electrocardiogram events to modulate visual stimuli in real-time, relative to participants' heartbeats. The main finding is that modulations during the period around when the heart has contracted (systole) increase rivalry dominance durations. This is a really neat result, that demonstrates the link between interoception and vision. I thought the Bayesian mixture modelling was a really smart way to identify cardiac non-perceivers, and the finding that the main result is preserved in this group is compelling. Overall, the study has been conducted to a high standard, is appropriately powered, and reported clearly. I have one suggestion about interpretation, which concerns the explanation of increased dominance durations with reference to contemporary models of binocular rivalry, and a few minor queries. However, I think this paper is a worthwhile addition to the literature.

      The point Reviewer 2 makes with respect to contemporary models of binocular rivalry is important – perhaps more so than its brief statement in this public review suggests. As we already expand upon in our Discussion, the effects of global (neural) inhibition depend on the preexisting role that inhibition plays in a given neural circuit. The original framing of the Baroreceptor Hypothesis describes baroreceptor activity of uniformly impeding sensory processing (Lacey, 1967; Lacey & Lacey, 1978, American Psychologist), which is contradicted by our present results. This account is often interpreted as implying the effects of baroreceptor activation is inhibitory in terms of neural mechanism (e.g. Rau et al., 1993, Psychophysiology; Edwards et al., 2009, Psychophysiology). Some researchers argue this serves a parallel function to the inhibitory projections from motor to sensory areas during volitional movement, “cancelling” the sensory effects of heartbeats (Van Elk, et al., 2014, Biological Psychology).

      However, baroreceptor activity has also been described as introducing noise into sensory processing rather than inhibiting it directly (e.g. Allen et al., 2022, PLoS Computational Biology). Lacey and Lacey’s own account actually seemed to point toward attention as a mediating mechanism (Hahn, 1973, Psychological Bulletin), with the disproportionate focus on cortical inhibition emerging in the literature over time. All this is to say that, while our results seem to falsify the behavioral predictions of the original Baroreceptor Hypothesis, subsequent versions of that hypothesis that describe an inhibitory neural mechanism, rather than an inhibition of perception per se, could potentially still be compatible with our results. This is a topic we plan to explore in future work.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript addresses a question inspired by the Baroceptor Hypothesis and its links to visual awareness and interoception. Specifically, the reported study aimed to determine if the effects of cardiac contraction (systole) on binocular rivalry (BR) are facilitatory or suppressive. The main experiment - relying on a technically challenging procedure of presenting stimuli synchronised with the heartbeats of participants - has been conducted with great care, and numerous manipulation checks the authors report convincingly show that the methods they used work as intended. Moreover, the control experiment allows for excluding alternative explanations related to participants being aware of their heartbeats. Therefore, the study convincingly shows the effect of cardiac activity on BR - and this is an important finding. The results, however, do not allow for unambiguously determining if this effect is facilitatory or suppressive (see details below), which renders the study not as informative as it could be.

      While the authors strongly focus on interoception and awareness, this study will be of interest to researchers studying BR as such. Moreover, the code and the data the authors share can facilitate the adoption of their methods in other labs.

      Strengths:

      (1) The study required a complex technical setup and the manuscript both describes it well and demonstrates that it was free from potential technical issues (e.g. in section 3.3. Manipulation check).

      (2) The sophisticated statistical methods the authors used, at least for a non-statistician like me, appear to be well-suited for their purpose. For example, they take into account the characteristics of BR (gamma distributions of dominance durations). Moreover, the authors demonstrate that at least in one case their approach is more conservative than a more basic one (Binomial test) would be.

      (3) Finally, the control experiment, and the analysis it enabled, allow for excluding a multitude of alternative explanations of the main results.

      (4) The authors share all their data and materials, even the code for the experiment.

      (5) The manuscript is well-written. In particular, it introduces the problem and methods in a way that should be easy to understand for readers coming from different research fields.

      Weaknesses:

      (1) The interpretation of the main result in the context of the Baroceptor hypothesis is not clear. The manuscript states: The Baroreceptor Hypothesis would predict that the stimulus entrained to systole would spend more time suppressed and, conversely, less time dominant, as cortical activity would be suppressed each time that stimulus pulses. The manuscript does not specify why this should be the case, and the term 'entrained' is not too helpful here (does it refer to neural entrainment? or to 'being in phase with'?). The answer to this question is provided by the manuscript only implicitly, and, to explain my concern, I try to spell it out here in a slightly simplified form.

      During systole (cardiac contraction), the visual system is less sensitive to external information, so it 'ignores' periods when the systole-synchronised stimulus is at the peak of its pulse. Conversely, the system is more sensitive during diastole, so the stimulus that is at the peak of its pulse then should dominate for longer, because its peaks are synchronised with the periods of the highest sensitivity of the visual system when the information used to resolve the rivalry is sampled from the environment. This idea, while indeed being a clever test of the hypothesis in question, rests on one critical assumption: that the peak of the stimulus pulse (as defined in the manuscript) is the time when the stimulus is the strongest for the visual system. The notion of 'stimulus strength' is widely used in the BR literature (see Brascamp et al., 2015 for a review). It refers to the stimulus property that, simply speaking, determines its tendency to dominate in the BR. The strength of a stimulus is underpinned by its low-level visual properties, such as contrast and spatial frequency content. Coming back to the manuscript, the pulsing of the stimuli affected at least spatial frequency (and likely other low-level properties), and it is unknown if it was in phase with the pulsing of the stimulus strength, or not. If my understanding of the premise of the study is correct, the conclusions drawn by the authors stand only if it was.

      In other words, most likely the strength of one of the stimuli was pulsating in sync with the systole, but is it not clear which stimulus it was. It is possible that, for the visual system, the stimulus meant to pulse in sync with the systole was pulsing strength-wise in phase with the diastole (and the one intended to pulse with in sync with the diastole strength-wise pulsed with the systole). If this is the case, the predictions of the Baroceptor Hypothesis hold, which would change the conclusion of the manuscript.

      We agree with Reviewer 3’s argumentation here. If the pulses decreased, rather than increased, effective stimulus strength, then the present results would indeed be consistent with the Baroreceptor Hypothesis. However, Wade et al. (1984) demonstrated that grating stimuli which pulse in the same manner (i.e. by dynamically varying the spatial frequency of the grating) as in our experiment indeed show increased stimulus strength relative to static stimuli, even if the dynamic stimuli have lower spatial frequency on average (https://doi.org/10.3758/BF03203891).

      We admit our results would be stronger had we included a replication of Wade at al. (1984) in our study, but in light of this previous work, our interpretation is indeed supported.

      (2) Using anaglyph goggles necessitates presenting stimuli of a different colour to each eye. The way in which different colours are presented can impact stimulus strength (e.g. consider that different anaglyph foils can attenuate the light they let through to different degrees). To deal with such effects, at least some studies on BR employed procedures of adjusting the colours for each participant individually (see Papathomas et al., 2004; Patel et al., 2015 and works cited there). While I think that counterbalancing applied in the study excludes the possibility that colour-related effects influenced the results, the effects of interest still could be stronger for one of the coloured foils.

      It is the case that, when we split the data up by eye (and thus by color), we only see statistically significant results for one eye – though the nominal direction of the effect is consistent across both eyes. So it is indeed possible that the effect could be stronger for one of the colored foils, but the present experiment was not designed to be powered to test that cardiac phase-by-color interaction.

      We concur with the Reviewer, however, that our use of counterbalancing excludes color-related effects as an explanation for our main findings.

      (3) Several aspects of the methods (e.g. the stimuli), are not described at the level of detail some readers might be accustomed to. The most important issue here is the task the participants performed. The manuscript says that they pressed a button whenever they experienced a switch in perception, but it is only implied that there were different buttons for each stimulus.

      There were indeed different buttons for each stimulus (i.e. a button to indicate their perception had switched to the red stimulus and another to indicate it had switched to blue). Our full, unmodified experiment code has been made available and is permanently archived (https://doi.org/10.5281/zenodo.10367327), so the full procedure is well documented and can be replicated exactly.

      Brascamp, J. W., Klink, P. C., & Levelt, W. J. M. (2015). The 'laws' of binocular rivalry: 50 years of Levelt's propositions. Vision Research, 109, 20-37. https://doi.org/10.1016/j.visres.2015.02.019

      Papathomas, T. V., Kovács, I., & Conway, T. (2004). Interocular grouping in binocular rivalry: Basic attributes and combinations. In D. Alais & R. Blake (Eds.), Binocular Rivalry (pp. 155-168). MIT Press

      Patel, V., Stuit, S., & Blake, R. (2015). Individual differences in the temporal dynamics of binocular rivalry and stimulus rivalry. Psychonomic Bulletin and Review, 22(2), 476-482. https://doi.org/10.3758/s13423-014-0695-1

    1. eLife assessment

      This important work by Zheng and colleagues uses a large cohort database from Shanghai to identify that post-infection vaccination among previously vaccinated individuals provides significant low to moderate protection against re-infection. The evidence supporting the conclusion is convincing with some limitations, e.g., lack of symptom severity as an outcome, and no inclusion of time since infection as an independent variable). This study will be of interest to vaccinologists, public health officials and clinicians.

    2. Reviewer #1 (Public Review):

      Summary:

      Zheng and colleagues assessed the real world efficacy of SARS-CoV-2 vaccination against re-infection following the large omicron wave in Shanghai in April, 2022. The study was performed among previously vaccinated individuals. The study successfully documents a small but real added protective benefit of re-vaccination, though this diminishes in previously boosted individuals. Unsurprisingly, vaccine preventative efficacy was higher if the vaccine was given in the month before the 2nd large wave in Shanghai. The re-infection rate of 24% suggests that long-term anti-COVID immunity is very difficult to achieve. The conclusions are largely supported by the analyses. These results may be useful for planning the timing of subsequent vaccine rollouts.

      Strengths:

      The strengths of the study are a very large and unique cohort based on synchronously timed single infection among individuals with well documented vaccine histories. Statistical analyses seem appropriate. As with any cohort study, there are potential confounders and the possibility of misclassification and the authors outline limitations nicely in the discussion.

      Weaknesses:

      The authors have addressed each of my points thoroughly.

    3. Reviewer #2 (Public Review):

      Summary:

      This paper evaluates the effect of COVID-19 booster vaccination on reinfection in Shanghai, China among individuals who received primary COVID-19 vaccination followed by initial infection, during an Omicron wave.

      Strengths:

      A large database is collated from electronic vaccination and infection records. Nearly 200,000 individuals are included in the analysis and 24% became reinfected.

      Weaknesses:

      The authors have revised the manuscript and have provided satisfactory responses to my prior comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zheng and colleagues assessed the real-world efficacy of SARS-CoV-2 vaccination against re-infection following the large omicron wave in Shanghai in April 2022. The study was performed among previously vaccinated individuals. The study successfully documents a small but real added protective benefit of re-vaccination, though this diminishes in previously boosted individuals. Unsurprisingly, vaccine preventative efficacy was higher if the vaccine was given in the month before the 2nd large wave in Shanghai. The re-infection rate of 24% suggests that long-term anti-COVID immunity is very difficult to achieve. The conclusions are largely supported by the analyses. These results may be useful for planning the timing of subsequent vaccine rollouts.

      Strengths:

      The strengths of the study are a very large and unique cohort based on synchronously timed single infection among individuals with well-documented vaccine histories. Statistical analyses seem appropriate. As with any cohort study, there are potential confounders and the possibility of misclassification and the authors outline limitations nicely in the discussion.

      Weaknesses:

      (1) Partially and fully vaccinated are never defined and it is difficult to understand how this differs from single, and double, booster vaccines. The figures including all of these groups are a bit confusing for this reason.

      We agree with the reviewer that the distinction between these groups could have been made clearer. To address this comment, we modified the legend of the figure that presents hazard ratios based on these two categorisations (here, and throughout this document, changes in the text are underlined):

      “Figure 3. Effect of post-infection vaccination on SARS-CoV-2 reinfection stratified by pre-infection vaccination. Error bars (95% CIs) and circles represent aHR for SARS-CoV-2 reinfection estimated using Cox proportional hazards models. V-I-V, 1V-I-V, 2V-I-V, 3V-I-V corresponds to any pre-infection vaccination, 1, 2 and 3 vaccine doses before infection, then vaccination, respectively; they were compared to  V-I, 1V-I, 2V-I, 3V-I, respectively. Partial V-I-V, Full V-I-V and Booster V-I-V represent partial vaccination, full vaccination and booster vaccination before infection, followed by post-infection vaccination, respectively. The number of doses received by individuals with partial versus full (and full with booster) vaccination depends on the type of SARS-CoV-2 vaccine received; in Table S3 we present a cross-classification of participants in the analytic population by these vaccination-related categorical variables.”

      Further, to facilitate visualisation of Figure 3, and emphasize that estimates are presented based on two different ways of categorising vaccination history, we have now included a horizontal line between estimates based on each category.

      Table S3 has been included in the Supplementary Appendix:

      (2) Figure 3 is a bit challenging to interpret because it is a bit atypical to compare each group to a different baseline (ie 2V-I-V vs 2V-I). I would label the y-axis 2V-I-V vs 2V-I (change all of the labels) to make this easier to understand.

      We agree that having the y-axis tick labels describing both groups being compared, rather than only describing the post-infection vaccination group, will help readers to understand this figure. In our response to the previous comment, we presented an updated version of this figure, where this change was also incorporated (see above).

      (3) A 15% reduction in infection is quite low. It would be helpful to discuss if any quantitative or qualitative signals suggest at least a reduction in severe outcomes such as death, hospitalization, ER visits, or long COVID. I am not sure that a 15% reduction in cases supports extra vaccination without some other evidence of added benefit.

      Unfortunately, data on the clinical severity of diagnosed SARS-CoV-2 infections were not available. Some previous studies on COVID-19 vaccines observed that effectiveness against severe outcomes was similar or higher than that for outcomes that do not imply severe disease (e.g. infection). For example, in a study in Israel comparing four versus three vaccine doses, Magen and colleagues observed that the effectiveness of a fourth dose, relative to three doses, was 52% against infection, 61% against symptomatic COVID-19, and 76% against COVID-19 related death (Magen et al. Fourth Dose of BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Setting. NEJM 2022; see also, for example, Nasreen et al. Effectiveness of COVID-19 vaccines against symptomatic SARS-CoV-2 infection and severe outcomes with variants of concern in Ontario. Nature Microbiology 2022, or Sacco et al. Effectiveness of BNT162b2 vaccine against SARS-CoV-2 infection and severe COVID-19 in children aged 5–11 years in Italy: a retrospective analysis of January–April, 2022. Lancet 2022). However, this pattern of increasing effectiveness with increasing outcome severity was not consistently reported in all studies or settings. We agree that public health officials who will use our results to guide future vaccination policy in China and abroad need to interpret the results in the context of these other outcomes that were not assessed and of those previous studies, that, although performed in different epidemiological settings, suggest that our analysis does not capture all benefits of post-infection vaccine doses.

      We have now included the following statements in the Discussion section:

      “Finally, data on the severity of infections during the second wave were not available, which prevented analyses of clinical outcomes other than infections (e.g. COVID-19-related hospitalization or death). Although some previous studies (Magen et al. Fourth Dose of BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Setting. NEJM 2022; Nasreen et al. Effectiveness of COVID-19 vaccines against symptomatic SARS-CoV-2 infection and severe outcomes with variants of concern in Ontario. Nature Microbiology 2022) estimated similar or higher vaccine effectiveness against severe outcomes compared to outcomes that presumably include both milder and severe presentations, this pattern was not observed in all studies. Epidemiologists and public health officials who will use our results to define vaccination policy should thus take into account the fact that our analysis does not capture all benefits of post-infection vaccinations.”

      (4) Why exclude the 74962 unvaccinated from the analysis. it would be interesting to see if getting vaccinated post-infection provides benefits to this group

      The reasons why we focused on individuals who had been vaccinated before their first infection were two: (i) in most settings, including those with SARS-CoV-2 epidemiologic history similar to that of Shanghai, a high percentage of the population has received vaccine doses; (ii) in settings with high vaccination coverage, the group of individuals who remain unvaccinated despite widespread availability of vaccines likely differs from those who have been vaccinated – for example, with regard to behavioural factors and comorbidity profile. Having said that, we agree that reporting analyses for the group of individuals who had not been vaccinated before first infection might be informative. We have thus included in the Supplementary Appendix a short section that reports results for this group of patients; Table S4 also presents these estimates.

      “Effect of post-infection vaccination in individuals with no history of vaccination before infection

      In this supplementary section, we present findings for individuals who were unvaccinated before infection during the first Omicron variant wave in Shanghai. For this group of individuals, post-infection vaccination did not confer significant protection against reinfection (adjusted hazard ratio [aHR] 1.06, 95% CI 0.97, 1.16). The analysis indicates that the effect of post-infection vaccine doses was not significant in both female (aHR 0.97 [0.84, 1.11]) and male individuals (aHR 1.12 [0.99, 1.26]), as well as for participants aged 60 years or older (aHR 0.92 [0.82, 1.04]) and younger adults (20-60 years) (aHR 1.12 [0.92, 1.37]). These results suggest that, in the context of the two Omicron variant waves in Shanghai, a first vaccine dose administered after infection did not provide a clear benefit in terms of reducing risk of subsequent infections for those not previously vaccinated.”

      We refer to this new analysis in the Results section:

      “For individuals who had received at least one vaccine dose before infection during the first Omicron variant wave, post-infection vaccination was protective against reinfection (adjusted hazard ratio [aHR] 0.82, 95% CI 0.79, 0.85). As shown in Figure 3, this protective effect was observed in subgroups defined by the number of pre-infection vaccine doses: aHR of 0.84 (95% CI, 0.76, 0.93) and 0.87 (95% CI, 0.83, 0.90) for one and two pre-infection doses respectively; and for patients with three vaccine doses prior to infection, the association was not statistically significant (aHR: 0.96 [0.74, 1.23]). When analyses are stratified by partial and full vaccination status before the first infection, an additional vaccine dose was protective (aHR 0.76 [0.68, 0.84], and 0.93 [0.89, 0.97], respectively); and among individuals who had received booster vaccination before the spread of the first Omicron variant wave in Shanghai, the hazard ratio estimate was consistent with a more limited effect (aHR: 0.95 [0.75, 1.22]). For comparison, results for individuals who had not been vaccinated before their first infection are shown in the Supplementary Appendix (supplementary section “Effect of post-infection vaccination in individuals with no history of vaccination before infection” and Table S4)”

      (5) Pudong should be defined for those who do not live in China.

      We have now included a sentence defining Pudong in the Methods section:

      “This study included individuals diagnosed with their first SARS-CoV-2 infection between April 1 and May 31, 2022 in the Pudong District, which is a large and densely populated district of Shanghai spanning an area of 1,210 square kilometers with a permanent resident population of 5.57 million, served by more than 30 hospitals and 60 community health centers;… ”

      (6) The discussion about healthcare utilization bias is welcomed and well done. It would be great to speculate on whether this bias might favor the null or alternative hypothesis.

      We believe the reviewer is referring to the following statement:

      “Differences in healthcare-seeking behavior could also bias case ascertainment between post-infection vaccinated and unvaccinated individuals, although, as we restricted the study population to individuals who had received at least one pre-infection dose, this potential bias might be more limited than in other vaccine studies.”

      Bias linked to healthcare seeking behaviour could affect the association between vaccination and infection in two different ways: individuals who are more health conscious are more likely to get vaccinated and also to seek medical care when infected, and this would bias results toward null; however, if the same individuals are also more likely to avoid exposure to potentially infectious individuals, their behaviour could also bias results in the opposite direction – that is, it would appear to increase vaccine effectiveness. As mentioned in the Discussion section, we expected this bias to be limited. We have now modified the paragraph:

      “Differences in healthcare-seeking behavior could also bias case ascertainment between post-infection vaccinated and unvaccinated individuals. Although we restricted the study population to individuals who had received at least one pre-infection vaccination, which suggests a higher degree of homogeneity in healthcare-seeking behaviour compared to that in the total population, it is possible that this bias might have affected our estimates. For example: individuals who were more health conscious might have been more likely to receive post-infection vaccination and also more likely to seek medical care or testing when reinfected, and this would have biased results toward the null; it is, however, also conceivable that these individuals were more likely to avoid contact with potentially infectious persons, which could have biased results in the opposite direction.”

      Reviewer #2 (Public Review):

      Summary:

      This paper evaluates the effect of COVID-19 booster vaccination on reinfection in Shanghai, China among individuals who received primary COVID-19 vaccination followed by initial infection, during an Omicron wave.

      Strengths:

      A large database is collated from electronic vaccination and infection records. Nearly 200,000 individuals are included in the analysis and 24% became reinfected.

      Weaknesses:

      The article is difficult to follow in terms of the objectives and individuals included in various analyses. There appear to be important gaps in the analysis. The electronic data are limited in their ability to draw causal conclusions.

      More detailed comments:

      In multiple places (abstract, introduction), the authors frame the work in terms of understanding the benefit of booster vaccination among individuals with hybrid immunity (vaccination + infection). However, their analysis population does not completely align with this framing. As best as I can tell, only individuals who first received COVID-19 vaccination, and subsequently experienced infection, were included. Why the analysis does not also consider individuals who were infected and then vaccinated is not clear.

      The focus of our analysis is on the most frequent scenario in many countries: settings where a high proportion of the population has been vaccinated. As mentioned in our response to a comment from Reviewer #1, those individuals who remain unvaccinated after the first years of this pandemic are likely to be different, with respect to many factors, from individuals with history of SARS-CoV-2 vaccination. Further, differences between unvaccinated and vaccinated individuals are likely setting-specific, linked to local availability of and access to vaccination, cultural differences in healthcare seeking behaviour, and possible differences in the frequencies of medical conditions that might influence (promote or prevent) vaccine uptake. We prefer to keep the focus of this work on individuals who had been vaccinated before their first infection; however, we have now included in the Supplementary Appendix a section, presented in a response to Reviewer #1, that reports results for this group of individuals.

      In vaccine effectiveness analyses, why was time since initial infection not examined as a modifier of the booster effect? Time since the onset of the Omicron wave is only loosely tied to the immune status of the individual.

      We agree with the reviewer that assessing effect modification by the time since initial infection would be important. However, in Shanghai, most initial infections occurred during a narrow time window relative to the time window between the first and second Omicron variant waves. Indeed, as mentioned in the Results section, most first infections (243,906, 88.8%) occurred in April; for 306 (0.1%) individuals, information on the date of first infection was not available. Given this narrow time window and in order to limit the number of comparisons in our study, we preferred not to investigate this aspect of the hybrid immunity. In settings where multiple SARS-CoV-2 waves occurred, over a longer period of time, which would imply sufficient variation in this variable “time since initial infection”, we believe that it would be essential to account for this.

      The effect of booster vaccination on preventing symptomatic vs. asymptomatic reinfection does not appear to have been evaluated; this is a key gap in the analysis and it would seem the data would support it.

      Not having clinical presentation data is a limitation in our study. That is a weakness of many real-world vaccine effectiveness analyses based large medical and administrative datasets. We have now explicitly mentioned this in the Discussion section.

      “Finally, data on the severity of infections during the second wave were not available, which prevented analyses of clinical outcomes other than infections (e.g. COVID-19-related hospitalization or death). Although some previous studies (Magen et al. Fourth Dose of BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Setting. NEJM 2022; Nasreen et al. Effectiveness of COVID-19 vaccines against symptomatic SARS-CoV-2 infection and severe outcomes with variants of concern in Ontario. Nature Microbiology 2022) estimated similar or higher vaccine effectiveness against severe outcomes compared to outcomes that presumably include both milder and severe presentations, this pattern was not observed in all studies. Epidemiologists and public health officials who will use our results to define vaccination policy should thus take into account the fact that our analysis does not capture all benefits of post-infection vaccinations.”

      In lines 105-108, the demographic description of the analysis population is incomplete. Is sex or gender identity being described? Are any individuals non-binary? What is the age distribution? (Only the proportions 20-39 and under 6 are stated.)

      We have now clarified in the manuscript that only information on sex at birth was provided by the Center for Disease Control and Prevention in Shanghai. We made the following change in the Methods section:

      “Information on infection history as well as data on demographic variables (sex at birth, and age) were provided by Center for Disease Control and Prevention in Shanghai, China”

      We have also modified the legend of Table 1:

      “Table 1. Characteristics of the study population and reinfection rate by post-infection vaccination status. Here, reinfection rate refers to the percentage of the relevant study subpopulation with evidence of reinfection between December 1, 2022 and January 3, 2023. Note that for the variables on region, occupation, and clinical severity, data are missing for large fractions of the study population. Note also that information was only available on sex at birth, but not on gender.”

      Regarding the reviewer’s comment on the age distribution, this information is presented for the following categories in Table 1: 0-6 years, 7-19 years, 20-39 years, 40-59 years, and 60+ years. However, we had not referred to Table 1 in the section 3.1 of the manuscript. We have now corrected that:

      “To assess the effect of an additional vaccine dose given after infection, the analytic sample consisted of 199,312 individuals (Figure 1). 85,804 were women (43.1%); 836 (0.4%) had gender information missing. 38.1% of the study participants were aged 20 to 39 years and only 0.9% were aged 0 to 6 years (see Table 1 for additional information).”

      Figure 1 consort diagram is confusing. In the last row, are the two boxes independent or overlapping sets of individuals? Are all included in secondary analyses?

      We agree that additional information should have been provided in the legend. The boxes represent overlapping sets of individuals – that is, some individuals were included in both secondary analyses in the box on the left and in the box on the right. These analyses involved different ways of categorizing individuals. Below is the updated figure legend:

      “Figure 1. Flow chart describing the selection of participants for the analysis. The number of individuals in this figure is not the same as some of the numbers in Table 1 because of missing data in key variables. Note that in the bottom part of the chart, related to secondary analyses, the boxes represent overlapping sets of study participants; in other words, some individuals included in the secondary analyses that correspond to the left box were also included in analyses corresponding to the box on the right.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Minor comment: the terms "vaccination"/"vaccinated" are used both to refer to the primary vaccination (pre-initial infection) and to the booster vaccination (post-initial vaccination), and this causes confusion.

      Thank you. We have now revised the manuscript (Methods, Results and Discussion sections) to use the terms “post-infection vaccination” and “post-infection vaccinated” to reduce ambiguity. We also included the following statement in the Background section:

      “In December 2022, an important change in the COVID-19 policy in China, namely the end of most social distancing measures and of mass screening activities, was associated with a second surge in SARS-CoV-2 infections in Shanghai. The current circulation of the virus in the Shanghainese population and reports of vaccine fatigue mean that it is important to estimate the protective effect of vaccination against reinfection in this population. In this study, we aimed to quantify the effect of vaccine doses given after a first infection on the risk of subsequent infection. For that, we used data collected during the first Omicron variant wave, when hundreds of thousands of individuals tested real-time polymerase chain reaction (RT-PCR)-positive for SARS-CoV-2 infection8 in Shanghai, of which 275,896 individuals in Pudong. The fact that the population in Shanghai was mostly SARS-CoV-2 infection naïve before the spread of the Omicron variant provides a unique opportunity to estimate the real-world benefit of post-infection vaccine doses in a population that was first exposed to infection during a relatively short and well-defined time window. We further investigated whether the number of pre-infection vaccination doses modified the protective effect of the post-infection dose against Omicron BA.5 sublineage. To avoid ambiguity in the text, in the following sections, we often refer to vaccine doses given after the initial infection as “post-infection vaccination” or “post-infection vaccine doses”.

    1. eLife assessment

      This study presents valuable evidence concerning the potential for naturalistic movie-viewing fMRI experiments to reveal some features that are correlated with the functional and topographical organization of the developing visual system in awake infants and toddlers. The data are compelling given the difficulty of studying this population, the methodology is original and validated, and the evidence supporting the conclusions is convincing and in line with prior research using resting-state and awake task-based fMRI. This study will be of interest to cognitive neuroscientists and developmental psychologists, and in particular those interested in using fMRI to investigate brain organisation in pediatric and clinical populations with limited tolerance to fMRI.

    2. Reviewer #1 (Public review):

      Ellis et al. investigated the functional and topographical organization of visual cortex in infants and toddlers, as evidenced by movie-viewing data. They build directly on prior research that revealed topographic maps in infants who completed a retinotopy task, claiming that even a limited amount of rich, naturalistic movie-viewing data (3-18 minutes) is sufficient to reveal this organization, within and across participants. Generating this evidence required methodological innovations to acquire high-quality fMRI data from awake infants (which have been described by this group, elsewhere) and analytical creativity. The authors provide evidence for structured functional responses in infant visual cortex at multiple levels of analyses; homotopic brain regions (defined based on a retinotopy task) responded more similarly to one another than to other brain regions in visual cortex during movie-viewing; ICA applied to movie-viewing data revealed components that were identifiable as spatial frequency, and to a lesser degree, meridian maps, and shared response modeling analyses suggested that visual cortex responses were similar across infants/toddlers, as well as across infants/toddlers and adults. These results are suggestive of fairly mature functional response profiles in visual cortex in infants/toddlers and highlight the potential of movie-viewing data for studying finer-grained aspects of functional brain responses.

      Strengths:

      - This study links the authors' prior evidence for retinotopic organization of visual cortex in human infants (Ellis et al., 2021) and research by others using movie-viewing fMRI experiments with adults to reveal retinotopic organization (e.g., Knapen, 2021) to strengthen our understanding of infant vision during naturalistic contexts and further evidence for the usefulness of movie-based experiments.<br /> - This study provides novel evidence that functional alignment approaches (specifically, shared response modeling) can be usefully applied to infant fMRI data. Further, code for reproducing such analyses (and others) will be made publicly available.<br /> - Awake infant fMRI data are rare and time-consuming and expensive to collect; they are therefore of high value to the community. The raw and preprocessed fMRI and anatomical data analyzed will be made publicly available.

      Weakness:

      - As the authors clearly state, movie-viewing experiments may not work as well as traditional retinotopy tasks; that is, this approach cannot currently be considered a replacement for retinotopy when accurate maps are needed.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript reports analyses of fMRI data from infants and toddlers watching naturalistic movies. Visual areas in the infant brain show distinct functions, consistent with previous studies using resting state and awake task-based infant fMRI. The pattern of activity in visual regions contains some features predicted by the regions' retinotopic responses. The revised version of the manuscript provides additional validation of the methodology and clarifies the claims. As a result, the data provide clear support for the claims.

      Strengths:

      The authors have collected a unique dataset: the same individual infants both watched naturalistic animations and a specific retinotopy task. Using these data positions the authors show that activity evoked by movies, in infants' visual areas, is correlated with the regions' retinopic response. The revised manuscript validates this methodology, using adult data. The revised manuscript also shows that an infant's movie-watching data is not sufficient or optimal to predict their visual areas' retinotopic responses; anatomical alignment with a group of previous participants provides more accurate prediction of a new participant's retinotopic response.

      Weaknesses:

      A key step in the analysis of the movie-watching data is the selection of independent components of the movie evoked response, by a trained researcher, that resemble retinotopic spatial patterns. While the researcher is unlikely to be biased by this infant's own retinotopy , as the authors argue, the researcher is actively looking for ICs that resemble average patterns of retinotopic response. So, how likely is it that ICs that resemble retinotopic organization arise by chance (i.e. in noise) in infant fMRI data? I do not see an analysis that addresses this question. With apologies if I missed it.

    4. Reviewer #3 (Public review):

      The manuscript reports data collected in awake toddlers recording BOLD while watching videos. The authors analyse the BOLD time series using two different statistical approaches, both very complex but that do not require any a priori determination of the movies features or contents to be associated with regressors. The two main messages are that 1) toddlers have occipital visual areas very similar to adults, given that a SRM model derive from adults BOLD is consistent with the infant brains as well; 2) the retinotopic organization and the spatial frequency selectivity of the occipital maps derived by applying correlation analysis are consistent with the maps obtained by standard and conventional mapping.

      Comments on revised version:

      The authors did a thorough revision of the manuscript which now is very clear. All the missing information has been added and the technical issue clarified. I think that it is a very good and important paper.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on the potential of short-movie viewing fMRI protocol to explore the functional and topographical organization of the visual system in awake infants and toddlers. Although the data are compelling given the difficulty of studying this population, the evidence presented is incomplete and would be strengthened by additional analyses to support the authors' claims. This study will be of interest to cognitive neuroscientists and developmental psychologists, especially those interested in using fMRI to investigate brain organisation in pediatric and clinical populations with limited fMRI tolerance.

      We are grateful for the thorough and thoughtful reviews. We have provided point-bypoint responses to the reviewers’ comments, but first, we summarize the major revisions here. We believe these revisions have substantially improved the clarity of the writing and impact of the results.

      Regarding the framing of the paper, we have made the following major changes in response to the reviews:

      (1) We have clarified that our goal in this paper was to show that movie data contains topographic, fine-grained details of the infant visual cortex. In the revision, we now state clearly that our results should not be taken as evidence that movies could replace retinotopy and have reworded parts of the manuscript that could mislead the reader in this regard.

      (2) We have added extensive details to the (admittedly) complex methods to make them more approachable. An example of this change is that we have reorganized the figure explaining the Shared Response Modelling methods to divide the analytic steps more clearly.

      (3) We have clarified the intermediate products contributing to the results by adding 6 supplementary figures that show the gradients for each IC or SRM movie and each infant participant.

      In response to the reviews, we have conducted several major analyses to support our findings further:

      (1) To verify that our analyses can identify fine-grained organization, we have manually traced and labeled adult data, and then performed the same analyses on them. The results from this additional dataset validate that these analyses can recover fine-grained organization of the visual cortex from movie data.

      (2) To further explore how visual maps derived from movies compare to alternative methods, we performed an anatomical alignment control analysis. We show that high-quality maps can be predicted from other participants using anatomical alignment.

      (3) To test the contribution of motion to the homotopy analyses, we regressed out the motion effects in these analyses. We found qualitatively similar results to our main analyses, suggesting motion did not play a substantial role.

      (4) To test the contribution of data quantity to the homotopy analyses, we correlated the amount of movie data collected from each participant with the homotopy results. We did not find a relationship between data quantity and the homotopy results. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ellis et al. investigated the functional and topographical organization of the visual cortex in infants and toddlers, as evidenced by movie-viewing data. They build directly on prior research that revealed topographic maps in infants who completed a retinotopy task, claiming that even a limited amount of rich, naturalistic movie-viewing data is sufficient to reveal this organization, within and across participants. Generating this evidence required methodological innovations to acquire high-quality fMRI data from awake infants (which have been described by this group, and elsewhere) and analytical creativity. The authors provide evidence for structured functional responses in infant visual cortex at multiple levels of analyses; homotopic brain regions (defined based on a retinotopy task) responded more similarly to one another than to other brain regions in visual cortex during movie-viewing; ICA applied to movie-viewing data revealed components that were identifiable as spatial frequency, and to a lesser degree, meridian maps, and shared response modeling analyses suggested that visual cortex responses were similar across infants/toddlers, as well as across infants/toddlers and adults. These results are suggestive of fairly mature functional response profiles in the visual cortex in infants/toddlers and highlight the potential of movie-viewing data for studying finer-grained aspects of functional brain responses, but further evidence is necessary to support their claims and the study motivation needs refining, in light of prior research.

      Strengths:

      - This study links the authors' prior evidence for retinotopic organization of visual cortex in human infants (Ellis et al., 2021) and research by others using movie-viewing fMRI experiments with adults to reveal retinotopic organization (Knapen, 2021).

      - Awake infant fMRI data are rare, time-consuming, and expensive to collect; they are therefore of high value to the community. The raw and preprocessed fMRI and anatomical data analyzed will be made publicly available.

      We are grateful to the reviewer for their clear and thoughtful description of the strengths of the paper, as well as their helpful outlining of areas we could improve.

      Weaknesses:

      - The Methods are at times difficult to understand and in some cases seem inappropriate for the conclusions drawn. For example, I believe that the movie-defined ICA components were validated using independent data from the retinotopy task, but this was a point of confusion among reviewers. 

      We acknowledge the complexity of the methods and wish to clarify them as best as possible for the reviewers and the readers. We have extensively revised the methods and results sections to help avoid potential misunderstandings. For instance, we have revamped the figure and caption describing the SRM pipeline (Figure 5).

      To answer the stated confusion directly, the ICA components were derived from the movie data and validated on the (completely independent) retinotopy data. There were no additional tasks. The following text in the paper explains this point:

      “To assess the selected component maps, we correlated the gradients (described above) of the task-evoked and component maps. This test uses independent data: the components were defined based on movie data and validated against task-evoked retinotopic maps.” Pg. 11

      In either case: more analyses should be done to support the conclusion that the components identified from the movie reproduce retinotopic maps (for example, by comparing the performance of movie-viewing maps to available alternatives (anatomical ROIs, group-defined ROIs). 

      Before addressing this suggestion, we want to restate our conclusions: features of the retinotopic organization of infant visual cortex could be predicted from movie data. We did not conclude that movie data could ‘reproduce’ retinotopic maps in the sense that they would be a replacement. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously23 found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses27, here we find that functional alignment is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      As per the reviewer’s suggestion and alluded to in the paragraph above, we have created anatomically aligned visual maps, providing an analogous test to the betweenparticipant analyses like SRM. We find that these maps are highly similar to the ground truth. We describe this result in a new section of the results:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment > functional alignment: ∆Fisher Z M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment > functional alignment: ∆Fisher Z M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment > functional alignment: ∆Fisher Z

      M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment > functional alignment: ∆Fisher Z M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Also, the ROIs used for the homotopy analyses were defined based on the retinotopic task rather than based on movie-viewing data alone - leaving it unclear whether movie-viewing data alone can be used to recover functionally distinct regions within the visual cortex.

      We agree with the reviewer that our approach does not test whether movie-viewing data alone can be used to recover functionally distinct regions. The goal of the homotopy analyses was to identify whether there was functional differentiation of visual areas in the infant brain while they watch movies. This was a novel question that provides positive evidence that these regions are functionally distinct. In subsequent analyses, we show that when these areas are defined anatomically, rather than functionally, they also show differentiated function (e.g., Figure 2). Nonetheless, our intention was not to use the homotopy analyses to define the regions. We have added text to clarify the goal and novelty of this analysis.

      “Although these analyses cannot define visual maps, they test whether visual areas have different functional signatures.” Pg. 6

      Additionally, even if the goal were to define areas based on homotopy, we believe the power of that analysis would be questionable. We would need to use a large amount of the movie data to define the areas, leaving a low-powered dataset to test whether their function is differentiated by these movie-based areas.

      - The authors previously reported on retinotopic organization of the visual cortex in human infants (Ellis et al., 2021) and suggest that the feasibility of using movie-viewing experiments to recover these topographic maps is still in question. They point out that movies may not fully sample the stimulus parameters necessary for revealing topographic maps/areas in the visual cortex, or the time-resolution constraints of fMRI might limit the use of movie stimuli, or the rich, uncontrolled nature of movies might make them inferior to stimuli that are designed for retinotopic mapping, or might lead to variable attention between participants that makes measuring the structure of visual responses across individuals challenging. This motivation doesn't sufficiently highlight the importance or value of testing this question in infants. Further, it's unclear if/how this motivation takes into account prior research using movie-viewing fMRI experiments to reveal retinotopic organization in adults (e.g., Knapen, 2021). Given the evidence for retinotopic organization in infants and evidence for the use of movie-viewing experiments in adults, an alternative framing of the novel contribution of this study is that it tests whether retinotopic organization is measurable using a limited amount of movie-viewing data (i.e., a methodological stress test). The study motivation and discussion could be strengthened by more attention to relevant work with adults and/or more explanation of the importance of testing this question in infants (is the reason to test this question in infants purely methodological - i.e., as a way to negate the need for retinotopic tasks in subsequent research, given the time constraints of scanning human infants?).

      We are grateful to the reviewer for giving us the opportunity to clarify the innovations of this research. We believe that this research contributes to our understanding of how infants process dynamic stimuli, demonstrates the viability and utility of movie experiments in infants, and highlights the potential for new movie-based analyses (e.g., SRM). We have now consolidated these motivations in the introduction to more clearly motivate this work:

      “The primary goal of the current study is to investigate whether movie-watching data recapitulates the organization of visual cortex. Movies drive strong and naturalistic responses in sensory regions while minimizing task demands12, 13, 24 and thus are a proxy for typical experience. In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion25–27. Movies have been useful in awake infant fMRI for studying event segmentation28, functional alignment29, and brain networks30. However, this past work did not address the granularity and specificity of cortical organization that movies evoke. For example, movies evoke similar activity in infants in anatomically aligned visual areas28, but it remains unclear whether responses to movie content differ between visual areas (e.g., is there more similarity of function within visual areas than between31). Moreover, it is unknown whether structure within visual areas, namely visual maps, contributes substantially to visual evoked activity. Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity – rather than anatomy – and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses27, 32–34.” Pg. 3-4

      Furthermore, the introduction culminates in the following statement on what the analyses will tell us about the nature of movie-driven activity in infants:

      “These three analyses assess key indicators of the mature visual system: functional specialization between areas, organization within areas, and consistency between individuals.” Pg. 5

      Furthermore, in the discussion we revisit these motivations and elaborate on them further:

      [Regarding homotopy:] “This suggests that visual areas are functionally differentiated in infancy and that this function is shared across hemispheres31.” Pg. 19

      [Regarding ICA:] “This means that the retinotopic organization of the infant brain accounts for a detectable amount of variance in visual activity, otherwise components resembling these maps would not be discoverable.” Pg. 19–20

      [Regarding SRM:] “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults27,32,33, or revealing changing function over development45.” Pg. 21

      Additionally, we have expanded our discussion of relevant work that uses similar methods such as the excellent research from Knapen (2021) and others:

      “In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion25-27.” Pg. 4

      “We next explored whether movies can reveal fine-grained organization within visual areas by using independent components analysis (ICA) to propose visual maps in individual infant brains25,26,35,42,43.” Pg. 9

      Reviewer #2 (Public Review):

      Summary:

      This manuscript shows evidence from a dataset with awake movie-watching in infants, that the infant brain contains areas with distinct functions, consistent with previous studies using resting state and awake task-based infant fMRI. However, substantial new analyses would be required to support the novel claim that movie-watching data in infants can be used to identify retinotopic areas or to capture within-area functional organization.

      Strengths:

      The authors have collected a unique dataset: the same individual infants both watched naturalistic animations and a specific retinotopy task. These data position the authors to test their novel claim, that movie-watching data in infants can be used to identify retinotopic areas.

      Weaknesses:

      To claim that movie-watching data can identify retinotopic regions, the authors should provide evidence for two claims:

      - Retinotopic areas defined based only on movie-watching data, predict retinotopic responses in independent retinotopy-task-driven data.

      - Defining retinotopic areas based on the infant's own movie-watching response is more accurate than alternative approaches that don't require any movie-watching data, like anatomical parcellations or shared response activation from independent groups of participants.

      We thank the reviewer for their comments. Before addressing their suggestions, we wish to clarify that we do not claim that movie data can be used to identify retinotopic areas, but instead that movie data captures components of the within and between visual area organization as defined by retinotopic mapping. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously23 found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses27, here we find that functional alignment with infants is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      In response to the reviewer’s suggestion, we compare the maps identified by SRM to the averaged, anatomically aligned maps from infants. We find that these maps are highly similar to the task-based ground truth and we describe this result in a new section:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment < functional alignment: ∆Fisher Z M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment < functional alignment: ∆Fisher Z M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment < functional alignment: ∆Fisher Z

      M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment < functional alignment: ∆Fisher Z M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Note that we do not compare the anatomically aligned maps with the ICA maps statistically. This is because these analyses are not comparable: ICA is run within-participant whereas anatomical alignment is necessarily between-participant — either infant or adults. Nonetheless, an interested reader can refer to the Table where we report the results of anatomical alignment and see that anatomical alignment outperforms ICA in terms of the correlation between the predicted and task-based maps.

      Both of these analyses are possible, using the (valuable!) data that these authors have collected, but these are not the analyses that the authors have done so far. Instead, the authors report the inverse of (1): regions identified by the retinotopy task can be used to predict responses in the movies. The authors report one part of (2), shared responses from other participants can be used to predict individual infants' responses in the movies, but they do not test whether movie data from the same individual infant can be used to make better predictions of the retinotopy task data, than the shared response maps.

      So to be clear, to support the claims of this paper, I recommend that the authors use the retinotopic task responses in each individual infant as the independent "Test" data, and compare the accuracy in predicting those responses, based on:

      -  The same infant's movie-watching data, analysed with MELODIC, when blind experimenters select components for the SF and meridian boundaries with no access to the ground-truth retinotopy data.

      -  Anatomical parcellations in the same infant.

      -  Shared response maps from groups of other infants or adults.

      -  (If possible, ICA of resting state data, in the same infant, or from independent groups of infants).

      Or, possibly, combinations of these techniques.

      If the infant's own movie-watching data leads to improved predictions of the infant's retinotopic task-driven response, relative to these existing alternatives that don't require movie-watching data from the same infant, then the authors' main claim will be supported.

      These are excellent suggestions for additional analyses to test the suitability for moviebased maps to replace task-based maps. We hope it is now clear that it was never our intention to claim that movie-based data could replace task-based methods. We want to emphasize that the discoveries made in this paper — that movies evoke fine-grained organization in infant visual cortex — do not rely on movie-based maps being better than alternative methods for producing maps, such as the newly added anatomical alignment.

      The proposed analysis above solves a critical problem with the analyses presented in the current manuscript: the data used to generate maps is identical to the data used to validate those maps. For the task-evoked maps, the same data are used to draw the lines along gradients and then test for gradient organization. For the component maps, the maps are manually selected to show the clearest gradients among many noisy options, and then the same data are tested for gradient organization. This is a double-dipping error. To fix this problem, the data must be split into independent train and test subsets.

      We appreciate the reviewer’s concern; however, we believe it is a result of a miscommunication in our analytic strategy. We have now provided more details on the analyses to clarify how double-dipping was avoided. 

      To summarize, a retinotopy task produced visual maps that were used to trace both area boundaries and gradients across the areas. These data were then fixed and unchanged, and we make no claims about the nature of these maps in this paper, other than to treat them as the ground truth to be used as a benchmark in our analyses. The movie data, which are collected independently from the same infant in the session, used the boundaries from the retinotopy task (in the case of homotopy) or were compared with the maps from the retinotopy task (in the case of ICA and SRM). In other words, the statement that “the data used to generate maps is identical to the data used to validate those maps” is incorrect because we generated the maps with a retinotopy task and validated the maps with the movie data. This means no double dipping occurred.

      Perhaps a cause of the reviewer’s interpretation is that the gradients used in the analysis are not clearly described. We now provide this additional description:  “Using the same manually traced lines from the retinotopy task, we measured the intensity gradients in each component from the movie-watching data. We can then use the gradients of intensity in the retinotopy task-defined maps as a benchmark for comparison with the ICA-derived maps.” Pg. 10

      Regarding the SRM analyses, we take great pains to avoid the possibility of data contamination. To emphasize how independent the SRM analysis is, the prediction of the retinotopic map from the test participant does not use their retinotopy data at all; in fact, the predicted maps could be made before that participant’s retinotopy data were ever collected. To make this prediction for a test participant, we need to learn the inversion of the SRM, but this only uses the movie data of the test participant. Hence, there is no double-dipping in the SRM analyses. We have elaborated on this point in the revision, and we remade the figure and its caption to clarify this point:

      We also have updated the description of these results to emphasize how double-dipping was avoided:

      “We then mapped the held-out participant's movie data into the learned shared space without changing the shared space (Figure 5c). In other words, the shared response model was learned and frozen before the held-out participant’s data was considered.

      This approach has been used and validated in prior SRM studies45.” Pg. 14

      The reviewer suggests that manually choosing components from ICA is double-dipping. Although the reviewer is correct that the manual selection of components in ICA means that the components chosen ought to be good candidates, we are testing whether those choices were good by evaluating those components against the task-based maps that were not used for the ICA. Our statistical analyses evaluate whether the components chosen were better than the components that would have been chosen by random chance. Critically: all decisions about selecting the components happen before the components are compared to the retinotopic maps. Hence there is no double-dipping in the selection of components, as the choice of candidate ICA maps is not informed by the ground-truth retinotopic maps. We now clarify what the goal of this process is in the results:

      “Success in this process requires that 1) retinotopic organization accounts for sufficient variance in visual activity to be identified by ICA and 2) experimenters can accurately identify these components.” Pg. 10

      The reviewer also alludes to a concern that the researcher selecting the maps was not blind to the ground-truth retinotopic maps from participants and this could have influenced the results. In such a scenario, the researcher could have selected components that have the gradients of activity in the places that the infant has as ground truth. The researcher who made the selection of components (CTE) is one of the researchers who originally traced the areas in the participants approximately a year prior to the identification of ICs. The researcher selecting the components didn’t use the ground-truth retinotopic maps as reference, nor did they pay attention to the participant IDs when sorting the IC components. Indeed, they weren’t trying to find participants-specific maps per se, but rather aimed to find good candidate retinotopic maps in general. In the case of the newly added adult analyses, the ICs were selected before the retinotopic mapping was reviewed or traced; hence, no knowledge about the participant-specific ground truth could have influenced the selection of ICs. Even with this process from adults, we find results of comparable strength as we found in infants, as shown in Figure S3. Nonetheless, there is a possibility that this researcher’s previous experience of tracing the infant maps could have influenced their choice of components at the participant-specific level. If so, it was a small effect since the components the researcher selected were far from the best possible options (i.e., rankings of the selected components averaged in the 64th percentile for spatial frequency maps and the 68th percentile for meridian maps). We believe all reasonable steps were taken to mitigate bias in the selection of ICs.

      Reviewer #3 (Public Review):

      The manuscript reports data collected in awake toddlers recording BOLD while watching videos. The authors analyse the BOLD time series using two different statistical approaches, both very complex but do not require any a priori determination of the movie features or contents to be associated with regressors. The two main messages are that 1) toddlers have occipital visual areas very similar to adults, given that an SRM model derived from adult BOLD is consistent with the infant brains as well; 2) the retinotopic organization and the spatial frequency selectivity of the occipital maps derived by applying correlation analysis are consistent with the maps obtained by standard and conventional mapping.

      Clearly, the data are important, and the author has achieved important and original results. However, the manuscript is totally unclear and very difficult to follow; the figures are not informative; the reader needs to trust the authors because no data to verify the output of the statistical analysis are presented (localization maps with proper statistics) nor so any validation of the statistical analysis provided. Indeed what I think that manuscript means, or better what I understood, may be very far from what the authors want to present, given how obscure the methods and the result presentation are.

      In the present form, this reviewer considers that the manuscript needs to be totally rewritten, the results presented each technique with appropriate validation or comparison that the reader can evaluate.

      We are grateful to the reviewer for the chance to improve the paper. We have broken their review into three parts: clarification of the methods, validation of the analyses, and enhancing the visualization.

      Clarification of the methods

      We acknowledge that the methods we employed are complex and uncommon in many fields of neuroimaging. That said, numerous papers have conducted these analyses on adults (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017) and non-human primates (Arcaro & Livingstone, 2017; Moeller et al., 2009). We have redoubled our efforts in the revision to make the methods as clear as possible, expanding on the original text and providing intuitions where possible. These changes have been added throughout and are too vast in number to repeat here, especially without context, but we hope that readers will have an easier time following the analyses now. 

      Additionally, we updated Figures 3 and 5 in which the main ICA and SRM analyses are described. For instance, in Figure 3’s caption we now add details about how the gradient analyses were performed on the components: 

      “We used the same lines that were manually traced on the task-evoked map to assess the change in the component’s response. We found a monotonic trend within area from medial to lateral, just like we see in the ground truth.” Pg. 11

      Regarding Figure 5, we reconsidered the best way to explain the SRM analyses and decided it would be helpful to partition the diagram into steps, reflecting the analytic process. These updates have been added to Figure 5, and the caption has been updated accordingly.

      We hope that these changes have improved the clarity of the methods. For readers interested in learning more, we encourage them to either read the methods-focused papers that debut the analyses (e.g., Chen et al., 2015), read the papers applying the methods (e.g., Guntupalli et al., 2016), or read the annotated code we publicly release which implements these pipelines and can be used to replicate the findings.

      Validation of the analyses

      One of the requests the reviewer makes is to validate our analyses. Our initial approach was to lean on papers that have used these methods in adults or primates (e.g., Arcaro,

      & Livingstone, 2017; Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Moeller et al., 2009) where the underlying organization and neurophysiology is established. However, we have made changes to these methods that differ from their original usage (e.g., we used SRM rather than hyperalignment, we use meridian mapping rather than traveling wave retinotopy, we use movie-watching data rather than rest). Hence, the specifics of our design and pipeline warrant validation. 

      To add further validation, we have rerun the main analyses on an adult sample. We collected 8 adult participants who completed the same retinotopy task and a large subset of the movies that infants saw. These participants were run under maximally similar conditions to infants (i.e., scanned using the same parameters and without the top of the head-coil) and were preprocessed using the same pipeline. Given that the relationship between adult visual maps and movie-driven (or resting-state) analyses has been shown in many studies (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017), these adult data serve as a validation of our analysis pipeline. These adult participants were included in the original manuscript; however, they were previously only used to support the SRM analyses (i.e., can adults be used to predict infant visual maps). The adult results are described before any results with infants, as a way to engender confidence. Moreover, we have provided new supplementary figures of the adult results that we hope will be integrated with the article when viewing it online, such that it will be easy to compare infant and adult results, as per the reviewer’s request. 

      As per the figures and captions below, the analyses were all successful with the adult participants: 1) Homotopic correlations are higher than correlations between comparable areas in other streams or areas that are more distant within stream. 2) A multidimensional scaling depiction of the data shows that areas in the dorsal and ventral stream are dissimilar. 3) Using independent components analysis on the movie data, we identified components that are highly correlated with the retinotopy task-based spatial frequency and meridian maps. 4) Using shared response modeling on the movie data, we predicted maps that are highly correlated with the retinotopy task-based spatial frequency and meridian maps.

      These supplementary analyses are underpowered for between-group comparisons, so we do not statistically compare the results between infants and adults. Nonetheless, the pattern of adult results is comparable overall to the infant results. 

      We believe these adult results provide a useful validation that the infant analyses we performed can recover fine-grained organization.

      The reviewer raises an additional concern about the lack of visualization of the results. We recognize that the plots of the summary statistics do not provide information about the intermediate analyses. Indeed, we think the summary statistics can understate the degree of similarity between the components or predicted visual maps and the ground truth. Hence, we have added 6 new supplementary figures showing the intensity gradients for the following analyses: 1. spatial frequency prediction using ICA, 2. meridian prediction using ICA, 3. spatial frequency prediction using infant SRM, 4.

      meridian prediction using infant SRM, 5. spatial frequency prediction using adult SRM, and 6. meridian prediction using adult SRM.

      We hope that these visualizations are helpful. It is possible that the reviewer wishes us to also visually present the raw maps from the ICA and SRM, akin to what we show in Figure 3A and 3B. We believe this is out of scope of this paper: of the 1140 components that were identified by ICA, we selected 36 for spatial frequency and 17 for meridian maps. We also created 20 predicted maps for spatial frequency and 20 predicted meridian maps using SRM. This would result in the depiction of 93 subfigures, requiring at least 15 new full-page supplementary figures to display with adequate resolution. Instead, we encourage the reader to access this content themselves: we have made the code to recreate the analyses publicly available, as well as both the raw and preprocessed data for these analyses, including the data for each of these selected maps.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) As mentioned in the public review, the authors should consider incorporating relevant adult fMRI research into the Introduction and explain the importance of testing this question in infants.

      Our public response describes the several citations to relevant adult research we have added, and have provided further motivation for the project.

      (2) The authors should conduct additional analyses to support their conclusion that movie data alone can generate accurate retinotopic maps (i.e., by comparing this approach to other available alternatives).

      We have clarified in our public response that we did not wish to conclude that movie data alone can generate accurate retinotopic maps, and have made substantial edits to the text to emphasize this. Thus, because this claim is already not supported by our analyses, we do not think it is necessary to test it further.

      (3) The authors should re-do the homotopy analyses using movie-defined ROIs (i.e., by splitting the movie-viewing data into independent folds for functional ROI definition and analyses).

      As stated above, defining ROIs based on the movie content is not the intended goal of this project. Even if that were the general goal, we do not believe that it would be appropriate to run this specific analysis with the data we collected. Firstly, halving the data for ROI definition (e.g., using half the movie data to identify and trace areas, and then use those areas in the homotopy analysis to run on the other half of data) would qualitatively change the power of the analyses described here. Secondly, we would be unable to define areas beyond hV4/V3AB with confidence, since our retinotopic mapping only affords specification of early visual cortex. Thus we could not conduct the MDS analyses shown in Figure 2.

      (4) If the authors agree that a primary contribution of this study and paper is to showcase what is possible to do with a limited amount of movie-viewing data, then they should make it clearer, sooner, how much usable movie data they have from infants. They could also consider conducting additional analyses to determine the minimum amount of fMRI data necessary to reveal the same detailed characteristics of functional responses in the visual cortex.

      We agree it would be good to highlight the amount of movie data used. When the infant data is first introduced in the results section, we now state the durations:

      “All available movies from each session were included (Table S2), with an average duration of 540.7s (range: 186--1116s).” Pg. 5

      Additionally, we have added a homotopy analysis that describes the contribution of data quantity to the results observed. We compare the amount of data collected with the magnitude of same vs. different stream effect (Figure 1B) and within stream distance effect (Figure 1C). We find no effect of movie duration in the sample we tested, as reported below:

      “We found no evidence that the variability in movie duration per participant correlated with this difference [of same stream vs. different stream] (r=0.08, p=.700).” Pg. 6-7

      “There was no correlation between movie duration and the effect (Same > Adjacent: r=-

      0.01, p=.965, Adjacent > Distal: r=-0.09, p=.740).” Pg. 7

      (5) If any of the methodological approaches are novel, the authors should make this clear. In particular, has the approach of visually inspecting and categorizing components generated from ICA and movie data been done before, in adults/other contexts?

      The methods we employed are similar to others, as described in the public review.

      However, changes were necessary to apply them to infant samples. For instance, Guntupalli et al. (2016) used hyperalignment to predict the visual maps of adult participants, whereas we use SRM. SRM and hyperalignment have the same goal — find a maximally aligned representation between participants based on brain function — but their implementation is different. The application of functional alignment to infants is novel, as is their use in movie data that is relatively short by comparison to standard adult data. Indeed, this is the most thorough demonstration that SRM — or any functional alignment procedure — can be usefully applied to infant data, awake or sleeping. We have clarified this point in the discussion.

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults27,32,33, or revealing changing function over development45, which may prove especially useful for infant fMRI52.” Pg. 21

      (6) The authors found that meridian maps were less identifiable from ICA and movie data and suggest that this may be because these maps are more susceptible to noise or gaze variability. If this is the case, you might predict that these maps are more identifiable in adult data. The authors could consider running additional analyses with their adult participants to better understand this result.

      As described in the manuscript, we hypothesize that meridian maps are more difficult to identify than spatial frequency maps because meridian maps are a less smooth, more fine-grained map than spatial frequency. Indeed, it has previously been reported (Moeller et al., 2009) that similar procedures can result in meridian maps that are constituted by multiple independent components (e.g., a component sensitive to horizontal orientations, and a separate component sensitive to vertical components). Nonetheless, we have now conducted the ICA procedure on adult participants and again find it is easier to identify spatial frequency components compared to meridian maps, as reported in the public review.

      Minor corrections:

      (1) Typo: Figure 3 title: "Example retintopic task vs. ICA-based spatial frequency maps.".

      Fixed

      (2) Given the age range of the participants, consider using "infants and toddlers"? (Not to diminish the results at all; on the contrary, I think it is perhaps even more impressive to obtain awake fMRI data from ~1-2-year-olds). Example: Figure 3 legend: "A) Spatial frequency map of a 17.1-monthold infant.".

      We agree with the reviewer that there is disagreement about the age range at which a child starts being considered a toddler. We have changed the terms in places where we refer to a toddler in particular (e.g., the figure caption the reviewer highlights) and added the phrase “infants and toddlers” in places where appropriate. Nonetheless, we have kept “infants” in some places, particularly those where we are comparing the sample to adults. Adding “and toddlers” could imply three samples being compared which would confuse the reader.

      (3) Figure 6 legend: The following text should be omitted as there is no bar plot in this figure: "The bar plot is the average across participants. The error bar is the standard error across participants.".

      Fixed

      (4) Table S1 legend: Missing first single quote: Runs'.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      I request that this paper cite more of the existing literature on the fMRI of human infants and toddlers using task-driven and resting-state data. For example, early studies by (first authors) Biagi, Dehaene-Lambertz, Cusack, and Fransson, and more recent studies by Chen, Cabral, Truzzi, Deen, and Kosakowski.

      We have added several new citations of recent task-based and resting state studies to the second sentence of the main text:

      “Despite the recent growth in infant fMRI1-6, one of the most important obstacles facing this research is that infants are unable to maintain focus for long periods of time and struggle to complete traditional cognitive tasks7.”

      Reviewer #3 (Recommendations For The Authors):

      In the following, I report some of my main perplexities, but many more may arise when the material is presented more clearly.

      The age of the children varies from 5 months to about 2 years. While the developmental literature suggests that between 1 and 2 years children have a visual system nearly adult-like, below that age some areas may be very immature. I would split the sample and perhaps attempt to validate the adult SRM model with the youngest children (and those can be called infants).

      We recognize the substantial age variability in our sample, which is why we report participant-specific data in our figures. While splitting up the data into age bins might reveal age effects, we do not think we can perform adequately powered null hypothesis testing of the age trend. In order to investigate the contribution of age, larger samples will be needed. That said, we can see from the data that we have reported that any effect of age is likely small. To elaborate: Figures 4 and 6 report the participant-specific data points and order the participants by age. There are no clear linear trends in these plots, thus there are no strong age effects.

      More broadly, we do not think there is a principled way to divide the participants by age. The reviewer suggests that the visual system is immature before the first year of life and mature afterward; however, such claims are the exact motivation for the type of work we are doing here, and the verdict is still out. Indeed, the conclusion of our earlier work reporting retinotopy in infants (Ellis et al., 2021) suggests that the organization of the early visual cortex in infants as young as 5 months — the youngest infant in our sample — is surprisingly adult-like.

      The title cannot refer to infants given the age span.

      There is disagreement in the field about the age at which it is appropriate to refer to children as infants. In this paper, and in our prior work, we followed the practice of the most attended infant cognition conference and society, the International Congress of Infant Studies (ICIS), which considers infants as those aged between 0-3 years old, for the purposes of their conference. Indeed, we have never received this concern across dozens of prior reviews for previous papers covering a similar age range. That said, we understand the spirit of the reviewer’s comment and now refer to the sample as “infants and toddlers” and to older individuals in our sample as “toddlers” wherever it is appropriate (the younger individuals would fairly be considered “infants” under any definition).

      Figure 1 is clear and an interesting approach. Please also show the average correlation maps on the cortical surface.

      While we would like to create a figure as requested, we are unsure how to depict an area-by-area correlation map on the cortical surface. One option would be to generate a seed-based map in which we take an area and depict the correlation of that seed (e.g., vV1) with all other voxels. This approach would result in 8 maps for just the task-defined areas, and 17 maps for anatomically-defined areas. Hence, we believe this is out of scope of this paper, but an interested reader could easily generate these maps from the data we have released.

      Figure 2 results are not easily interpretable. Ventral and dorsal V1-V3 areas represent upper or lower VF respectively. Higher dorsal and ventral areas represent both upper and lower VF, so we should predict an equal distance between the two streams. Again, how can we verify that it is not a result of some artifacts?

      In adults, visual areas differ in their functional response properties along multiple dimensions, including spatial coding. The dorsal/ventral stream hypothesis is derived from the idea that areas in each stream support different functions, independent of spatial coding. The MDS analysis did not attempt to isolate the specific contribution of spatial representations of each area but instead tested the similarity of function that is evoked in naturalistic viewing. Other covariance-based analyses specifically isolate the contribution of spatial representations (Haak et al., 2013); however, they use a much more constrained analysis than what was implemented here. The fact that we find broad differentiation of dorsal and ventral visual areas in infants is consistent with adults (Haak & Beckman, 2018) and neonate non-human primates (Arcaro & Livingstone, 2017). 

      Nonetheless, we recognize that we did not mention the differences in visual field properties across areas and what that means. If visual field properties alone drove the functional response then we would expect to see a clustering of areas based on the visual field they represent (e.g., hV4 and V3AB should have similar representations). Since we did not see that, and instead saw organization by visual stream, the result is interesting and thus warrants reporting. We now mention this difference in visual fields in the manuscript to highlight the surprising nature of the result.

      “This separation between streams is striking when considering that it happens despite differences in visual field representations across areas: while dorsal V1 and ventral V1 represent the lower and upper visual field, respectively, V3A/B and hV4 both have full visual field maps. These visual field representations can be detected in adults41; however, they are often not the primary driver of function39. We see that in infants too: hV4 and V3A/B represent the same visual space yet have distinct functional profiles.” Pg. 8

      The reviewer raises a concern that the MDS result may be spurious and caused by noise. Below, we present three reasons why we believe these results are not accounted for by artifacts but instead reflect real functional differentiation in the visual cortex. 

      (1) Figure 2 is a visualization of the similarity matrix presented in Figure S1. In Figure S1, we report the significance testing we performed to confirm that the patterns differentiating dorsal and ventral streams — as well as adjacent areas from distal areas — are statistically reliable across participants. If an artifact accounted for the result then it would have to be a kind of systematic noise that is consistent across participants.

      (2) One of the main sources of noise (both systematic and non-systematic) with infant fMRI is motion. Homotopy is a within-participant analysis that could be biased by motion. To assess whether motion accounts for the results, we took a conservative approach of regressing out the framewise motion (i.e., how much movement there is between fMRI volumes) from the comparisons of the functional activity in regions. Although the correlations numerically decreased with this procedure, they were qualitatively similar to the analysis that does not regress out motion:

      “Additionally, if we control for motion in the correlation between areas --- in case motion transients drive consistent activity across areas --- then the effects described here are negligibly different (Figure S5).” Pg. 7

      (3) We recognize that despite these analyses, it would be helpful to see what this pattern looks like in adults where we know more about the visual field properties and the function of dorsal and ventral streams. This has been done previously (e.g., Haak & Beckman, 2018), but we have now run those analyses on adults in our sample, as described in the public review. As with infants, there are reliable differences in the homotopy between streams (Figure S1). The MDS results show that the adult data was more complex than the infant data, since it was best described by 3 dimensions rather than 2. Nonetheless, there is a rotation of the MDS such that the structure of the ventral and dorsal streams is also dissociable. 

      Figure 3 also raises several alternative interpretations. The spatial frequency component in B has strong activity ONLY at the extreme border of the VF and this is probably the origin of the strong correlation. I understand that it is only one subject, but this brings the need to show all subjects and to report the correlation. Also, it is important to show the putative average ICA for retinotopy and spatial frequencies across subjects and for adults. All methods should be validated on adults where we have clear data for retinotopy and spatial frequency.

      The reviewer notes that the component in Figure 3 shows strong negative response in the periphery. It is often the case, as reported elsewhere (Moeller et al., 2009), that ICA extracts portions of visual maps. To make a full visual map would require combining components into a composite (e.g., a component that has a high response in the periphery and another component that has a high response in the fovea). If we were to claim that this component, or others like it, could replace the need for retinotopic mapping, then we would want to produce these composite maps; however, our conclusion in this project is that the topographic information of retinotopic maps manifest in individual components of ICA. For this purpose, the analysis we perform adequately assesses this topography.

      Regarding the request to show the results for all subjects, we address this in the public response and repeat it here briefly: we have added 6 new figures to show results akin to Figure 3C and D. It is impractical to show the equivalent of Figure 3A and B for all participants, yet we do release the data necessary to see to visualize these maps easily.

      Finally, the reviewer suggests that we validate the analyses on adult participants. As shown in Figure S3 and reported in the public response, we now run these analyses on adult participants and observe qualitatively similar results to infants.

      How much was the variation in the presumed spatial frequency map? Is it consistent with the acuity range? 5-month-old infants should have an acuity of around 10c/deg, depending on the mean luminance of the scene.

      The reviewer highlights an important weakness of conducting ICA: we cannot put units on the degree of variation we see in components. We now highlight this weakness in the discussion:

      “Another limitation is that ICA does not provide a scale to the variation: although we find a correlation between gradients of spatial frequency in the ground truth and the selected component, we cannot use the component alone to infer the spatial frequency selectivity of any part of cortex. In other words, we cannot infer units of spatial frequency sensitivity from the components alone.” Pg. 20

      Figure 5 pipeline is totally obscure. I presumed that I understood, but as it is it is useless. All methods should be clearly described, and the intermediate results should be illustrated in figures and appropriately discussed. Using such blind analyses in infants in principle may not be appropriate and this needs to be verified. Overall all these techniques rely on correlation activities that are all biased by head movement, eye movement, and probably the dummy sucking. All those movements need to be estimated and correlated with the variability of the results. It is a strong assumption that the techniques should work in infants, given the presence of movements.

      We recognize that the SRM methods are complex. Given this feedback, we remade Figure 5 with explicit steps for the process and updated the caption (as reported in the public review).

      Regarding the validation of these methods, we have added SRM analyses from adults and find comparable results. This means that using these methods on adults with comparable amounts of data as what we collected from infants can predict maps that are highly similar to the real maps. Even so, it is not a given that these methods are valid in infants. We present two considerations in this regard. 

      First, as part of the SRM analyses reported in the manuscript, we show that control analyses are significantly worse than the real analyses (indicated by the lines on Figure 6). To clarify the control analysis: we break the mapping (i.e., flip the order of the data so that it is backwards) between the test participant and the training participants used to create the SRM. The fact that this control analysis is significantly worse indicates that SRM is learning meaningful representations that matter for retinotopy. 

      Second, we believe that this paper is a validation of SRM for infants. Infant fMRI is a nascent field and SRM has the potential to increase the signal quality in this population. We hope that readers will see these analyses as a proof of concept that SRM can be used in their work with infants. We have stated this contribution in the paper now.

      “Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity -- rather than anatomy -- and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses27,32-34.” Pg. 4

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults27,32,33, or revealing changing function over development45.” Pg. 21

      Regarding the reviewer’s concern that motion may bias the results, we wish to emphasize the nature of the analyses being conducted here: we are using data from a group of participants to predict the neural responses in a held-out participant. For motion to explain consistency between participants, the motion would need to be timelocked across participants. Even if motion was time-locked during movie watching, motion will impair the formation of an adequate model that can contain retinotopic information. Thus, motion should only hurt the ability for a shared response to be found that can be used for predicting retinotopic maps. Hence, the results we observed are despite motion and other sources of noise.

      What is M??? is it simply the mean value??? If not, how it is estimated?

      M is an abbreviation for mean. We have now expanded the abbreviation the first time we use it.

      Figure 6 should be integrated with map activity where the individual area correlation should be illustrated. Probably fitting SMR adult works well for early cortical areas, but not for more ventral and associative, and the correlation should be evaluated for the different masks.

      With the addition of plots showing the gradients for each participant and each movie (Figures S10–S13) we hope we have addressed this concern. We additionally want to clarify that the regions we tested in the analysis in Figure 6 are only the early visual areas V1, V2, V3, V3A/B, and hV4. The adult validation analyses show that SRM works well for predicting the visual maps in these areas. Nonetheless, it is an interesting question for future research with more extensive retinotopic mapping in infants to see if SRM can predict maps beyond extrastriate cortex.

      Occipital masks have never been described or shown.

      The occipital mask is from the MNI probabilistic structural atlas (Mazziotta et al., 2001), as reported in the original version and is shared with the public data release. We have added the additional detail that the probabilistic atlas is thresholded at 0% in order to be liberally inclusive. 

      “We used the occipital mask from the MNI structural atlas63 in standard space -- defined liberally to include any voxel with an above zero probability of being labelled as the occipital lobe -- and used the inverted transform to put it into native functional space.” Pg. 27–28

      Methods lack the main explanation of the procedures and software description.

      We hope that the additions we have made to address this reviewer’s concerns have provided better explanations for our procedures. Additionally, as part of the data and code release, we thoroughly explain all of the software needed to recreate the results we have observed here.

    1. eLife assessment

      In this important study, Bu et al investigate how cell overcrowding triggers a mechano-transduction pathway involving TRPV4 channels, focusing on high-grade ductal carcinoma in situ (DCIS) cells. The authors show that cell crowding in these malignant cells leads to a reduction in cell volume and promotes a pro-invasive phenotype through calcium homeostasis and TRPV4 channel trafficking to the plasma membrane; this phenomenon is specific to invasive cell lines like MCF10CA and DCIS and is corroborated by patient tissue samples. The work suggests the role of TRPV4 in cell motility and mechanical sensing, offering potential therapeutic insights for targeting cancer metastasis. While the study presents robust and convincing data, the absence of TRPV4 genetic ablation is a critical limitation, which would further confirm its role in these processes.