2,371 Matching Annotations
  1. Oct 2023
    1. Author Response

      Reviewer #1 (Public Review):

      This paper evaluates the effect of knocking out CST7(Cystatin 5) on the APPNL-G-F Alzheimer's disease mouse model. They found sexually dimorphic outcomes, with differential transcriptional responses, increased phagocytosis (but interestingly a higher plaque burden) in females and suppressed inflammatory microglial activation in males (but interestingly no change in plaque burden). This study offers new insight into the functional role of CST7 that is upregulated in a subset of disease- associated microglia in AD models and human brain. Despite the discovery of disease-associated microglia several years ago, there has been little effort in understanding the function of the different genes that make up this profile, making this paper especially timely. Overall, the experiments are well-controlled and the data support the main conclusions and the manuscript could be strengthened by addressing the below comments and clarifying questions that could impact the interpretation of their data/ findings.

      1) In the first section discussing CST7 expression levels in AD models, it would be good to involve a discussion of levels of CST7 change in human AD samples. There are sufficient available datasets to look at this, and it would help us understand how comparable the animal models are to human patients. For example, while in mice CST7 is highly enriched in microglia/macrophages, in human datasets it seems like it is not quite so specific to microglia - it is equally expressed in endothelial cells. This might have a significant impact on the interpretation of the data, and it would be good to introduce and assess the findings in mice through the human subjects lens. There is a discussion of the human data in the discussion section, but it would be more appropriately assessed in the same way as the mouse data and comparatively presented in the results section. The authors could also include the data from Gerrits et al. 2021 in their first figure.

      We agree with the reviewer on the importance of considering the work in the context of human disease. While CST7 is not as strongly upregulated in human AD brain as it is in mouse expression is observed predominantly in myeloid cells in the brain with very minimal expression detected in endothelial cells (see screenshots in Author response image 1 from Brain Myeloid Landscape platform (http://research-pub.gene.com/BrainMyeloidLandscape/BrainMyeloidLandscape2/) and is enriched in AD clusters vs homeostatic in scRNASeq studies (Gerrits et al., 2021). We attempted immunostaining for human CF (CST7) in AD brains to assess expression and co-localisation with microglial markers but failed to validate any of the antibodies tested. Additionally, King et al., 2023 (PMID: 36547260) recently showed increase in CST7 expression in bulk hippocampal RNASeq in AD vs mid-life controls suggesting an ageing/AD mechanism. CST7 has also been shown to be expressed following overexpression of TREM2 in human microglia in vitro and that siRNA-mediated knockdown of expression leads to an increase in phagocytosis (Popescu et al., 2023 - PMID: 36480007), mirroring our data and suggesting a conserved role in human cells. Overall, we believe that, even in the context of mouse models, the understanding of the function of genes upregulated in disease is of importance to the field and that this study paves the way for further work investigating human CST7 in disease. We have added this (with citations to the datasets mentioned) to the discussion (highlighted).

      Author response image 1

      2) The differential RNAseq data is perhaps one of the most striking results of this paper; however it is difficult to see exactly how similar the male v female APPNL-G-F profiles are, in addition to the genes shared or not between the KO condition. Venn diagrams, in addition to statistical tests, would enhance this part of the paper and add more clarity.

      We have added Venn diagrams to show DEGs between male and female AppNL-G-F microglia vs WT control to show how similar the male v female APPNL-G-F profiles are. Additionally, to exemplify the Cst7KO-Sex interaction, a Venn showing DEGs between male and female AppNL-G-F microglia vs. AppNL-G-FCst7-/- microglia (Fig. 2 – Fig. supplement 3). We confirm we have derived all differential gene expression changes reported (including those represented in the Venn diagrams) using appropriate Padj statistical approaches (see Methods).

      3) A major argument in the paper is a continuation of Sala-Frigerio 2019 which says that the female phenotype is an acceleration of the male phenotype. Does this mean that if males were assessed at later timepoints, they would be more similar to the females? Or are there intrinsic differences that never resolve? It would be helpful to see a later timepoint for males to get at the difference between these two options

      This is an interesting question and while we acknowledge that empirically addressing with a later timepoint could add insight, we believe it would actually need multiple closely-spaced timepoints as choosing what single later timepoint would be optimal is difficult to judge (and likely not possible at all) for reasons below. We also believe data already published combined with our observations show it is most-likely a cell-intrinsic effect that explains our sex-specific differences.

      First, we emphasize the acceleration of the microglial phenotype in female AppNL-G-F mice previously published is fairly subtle and relative rather than absolute e.g. the DAM/ARM microglia state represents ~50% of all microglia in male and ~55% of all microglia in females at 12 months old therefore both sexes have similarly abundant microglia in the state that most highly express Cst7. Indeed, after the age at which DAM/ARM state microglia appear in appreciable numbers (~ 6 months), both females and males both have an abundance of them. It is important to note that a 12-month male is far more “progressed” than a 6-month female hence the stepped age effect is temporally short.

      Second, Cst7 deletion in the AppNL-G-F mice condition caused qualitative differences affecting distinct genes and/or overlapping genes moving in different directions between female and male mice - if a stepped age effect explained sex differences from Cst7 deletion, given that it could only be stepped by a very short timeframe (several weeks maximum) from reasoning above, we would expect to see similar qualitative changes but of different magnitude in female and male mice arising from Cst7 deletion; this is not the pattern we see.

      Third, beyond 12 months old, regression from ARM/DAM actually occurs, again making it unlikely males would “catch up” with females to show the same profile from Cst7 deletion but just at an older age – practically, this also complicates choosing a single later timepoint (and age-related systemic morbidity emerges as a potential confounder as well).

      In summary, while the acceleration of the DAM signature in female microglia offers an intriguing possible explanation to our observation of sexual dimorphism in response to deletion of one of the key genes in this signature, we believe it more likely that intrinsic effects are responsible for the Cst7 deletion sex-related impact. Taking the alternative perspective, even if a stepped age effect in the underlying progression of the model could explain our findings, this would need multiple timepoints with short gaps between (e.g. monthly at 12, 13, 14, 15 months old) to provide the temporal resolution to expose this pattern; we would not have the resources to conduct such a resource-intensive and lengthy study. We hope this reasoning appears logical and conscious of the importance to convey this in our manuscript we have revised the Discussion to as concisely as possible capture some key points outlined above.

      4) If the central argument is that CST7 in females decreases phagocytosis and in males increases microglia activation, are there changes in amyloid plaque burden or structure in the APPNL-G-F /CST 7 KO mice compared to APPNL-G-F/CST7 WT that reflect these changes? Please address. If not, how does this affect the functional interpretation of differential expression observed in phagocytic/reactive microglia genes? Pieces of this are discussed but it could be clearer.

      We emphasise the data already presented in Fig 6 and Fig. 6 – Fig. Supplement 2 showing altered Aβ burden (6E10 staining) and plaque count (MeX04) but no change in plaque area. Regarding the functional interpretation of Cst7-dependent gene changes in microglia beyond the endolysosomal function we present in figures 3-5, we have included additional data using simple immunohistochemistry, as suggested by the reviewer, to assess synapse abundance. We show loss of Sy38 coverage around plaques (Fig. 6I) and a moderate but significant decrease in coverage between AppNL-G-F/Cst7-/- vs AppNL-G-F brains only in females (Fig. 6J). This reflects the effect observed with plaque coverage whereby we observe increased burden in AppNL-G-F/Cst7-/- vs AppNL-G-F females but not males (Fig. 6B-F) suggesting the increased plaque burden in Cst7-/- female mice may lead to increased synapse loss. We would also emphasise that altered expression of phagolysosomal genes could affect disease in ways beyond interactions with amyloid and synapses.

      5) It is confusing that increased phagocytosis in the APPNL-G-F/CST7 KO females leads to greater plaque burden, considering proteolysis is not affected. What might explain this observation? Additionally, it is interesting that suppression of microglial activation doesn't lead to an increase in plaques in the male APPNL-G-F/CST7 KO mice. How does the profile of phagocytic microglia in the male APPNL-G-F/CST7 KO mice differ from the APPNL-G-F males?

      We emphasize our comments on this topic in the discussion where we speculate that the greater plaque burden in females is linked to increased uptake of Aβ (which we observe in Fig. 4B&C) and deposition into plaques as suggested by Huang et al., 2021 (PMID: 33859405), d’Errico et al., 2022 (PMID: 34811521) and Shabestari et al., 2022 (PMID: 35705056). Regarding the lack of effect in males despite the suppression of inflammatory genes, we agree this is a curious observation, although may point to as yet ill-defined mechanisms for how inflammatory pathways influence plaque pathology. Unfortunately, we were not able to specifically compare the profile of phagocytic microglia in AppNL-G-F vs AppNL-G-FCst7-/- as we did not perform single-cell RNASeq. However, our bulk RNASeq profiling suggests modest downregulation of phagocytic/endolysosomal genes (eg Lilrb4a, Fig. 2I) and reduced expression of LAMP2 in microglia by immunostaining. We have added further comment on this in the discussion.

      6) Seems that the authors have potentially discovered an unusual mechanism for how CST7 could regulate cell autonomous function without impacting its canonical protease target. The authors deal with this extensively in the discussion but an ELISA or ICC to localize CST7 to microglia in vitro or in vitro would help address this point.

      We have added FISH data localising Cst7 expression to IBA1+ cells specifically around plaques in App brains (Fig. 1B-E). We agree that assessing the subcellular localisation and any non-microglial expression of Cystatin-F (the protein coded by Cst7) would offer valuable insight into the protease target and may reveal details on the precise mechanism by which CF deletion leads the phenotype we observe in this study. However, despite attempting numerous commercially available and gifted antibodies to detect CF we were unable to validate (using Cst7-/- as controls) any methods other than FISH.

      7) The authors focus on plaques in their final figure, however dysregulated microglial phagocytosis could impact many other aspects of brain health. Simple immunohistochemistry for synapses and myelin/oligodendrocytes (especially given the results of the in vitro phagocytosis assay) could provide more insight here.

      We fully agree with the reviewer. As also outlined in our responses elsewhere, phagocytic changes could have multiple consequences, and we have included additional data using immunohistochemistry as advised for synapses in WT, AppNL-G-F, and AppNL-G-F/Cst7-/- brains. We show loss of Sy38 coverage around plaques (Fig. 6I) and a moderate but significant decrease in coverage between AppNL-G-F/Cst7-/- vs AppNL-G-F brains only in females (Fig. 6J). This reflects the effect observed with plaque coverage whereby we observe increased burden in AppNL-G-F/Cst7-/- vs AppNL-G-F females but not males (Fig. 6B-F) suggesting the increased plaque burden in Cst7-/- female mice may lead to increased synapse loss.

      We also performed immunohistochemistry for myelin makers MAG and MBP but found no plaque-associated pathology. Finally, we searched for dystrophic neurites using LAMP1 but found that the antibody stained microglial lysosomes rather than dystrophic neurites in this model (see Author response image 2), an observation that has been made by others (Sharoar et al., 2021 - PMID: 34215298).

      Overall, our data suggest Cst7 may play a protective role in females, limiting phagocytosis, reducing plaque burden and blunting synapse loss.

      Author response image 2.

      Reviewer #3 (Public Review):

      In this manuscript, Daniels et al explored the role of Cystatin F in an A-driven mouse model of Alzheimer's disease. By crossing a constitutive knockout mouse lacking the gene that encodes Cystatin F, Cst7, to the AppNL-G-F mouse line, the authors describe impairments in microglial gene expression and phagocytic function that emerge more prominently in females versus males lacking Cst7. A strength of the study is its focus: given mounting evidence that microglia are a hub of neurological dysfunction with particular potential to trigger or exacerbate neurodegenerative disorders, it is essential to determine the changes in microglia that occur pathologically to promote disease progression. Similarly, the wide-spread identification of the gene in question, Cst7, as upregulated in AD models makes this gene a good target for mechanistic studies.

      The paper in its current form also has several weaknesses which limit the insights derived, weaknesses that are largely related to the experimental tools and approaches chosen by the authors to test their hypotheses. For example, the paper begins with a figure replotting data from previous studies showing that Cst7 is upregulated in mouse models of Alzheimer's disease. Though relevant to the current study, there are no new insights provided here. Next, the authors perform bulk RNA-sequencing on microglia isolated from male and female mice in the Cst7-/-; AppNL-G-F mouse line. In the methods, it is unclear whether the authors took precautions to preserve the endogenous transcriptional state of these cells given evidence that microglia can acquire a DAM-like signature simply due to the process of dissociation (Marsh et al, Nature Neuroscience, 2022). If the authors did not control for this, their results may not support the conclusions they draw from the data. Relatedly, it appears the authors pooled all microglia together here, instead of just isolating DAMs specifically or analyzing microglia at single-cell resolution, which could reveal the heterogeneous nature of the role of Cst7 in microglia. In addition to losing information about heterogeneity, another concern is that they could be diluting out the major effects of the model on microglial function by including all microglia. Overall, the biggest issue I have with the RNA-sequencing data is the lack of validation of the gene expression changes identified using a different method that does not require dissociation, like immunohistochemistry or fluorescence in situ hybridization. Especially given the limited number of genes they found to be mis-regulated (see Fig. 2 E and G), I worry that these changes might simply be noise, especially since the authors provide no further evidence of their mis-regulation. Without further validation, the data presented are not sufficient to support the authors' claims.

      We believe we have addressed this comment in the “Essential Revisions (for the authors)” section above. Please see again below:

      We took standard precautions to minimise the risk of aberrant ex vivo cell activation, including maintaining cells on ice during non-enzyme steps of the procedure and carrying out preps in small batches to minimise time taken from removal of brain to purification of microglial RNA. Importantly, we also validated key expression data by in situ methods such as RNA FISH for Cst7 and Lilrb4a (Fig. 1B-E, Fig 2. - Fig. supplement 3) thus eliminating dissection-induced effects. Additionally, when performing qPCR on microglia from non-disease mice to test the disease-specific role of Cst7-dependent gene regulation we did not observe the same gene changes (Fig 2. - Fig. supplement 4) which, if such changes were dependent on tissue dissociation, we would expect to observe in WT or disease animals. We utilised the resources provided by Marsh et al. 2022 to search for overlap between enzyme-induced genes and our DEG lists from our key comparisons. We found the enzyme-induced gene set had very minimal overlap with any of our comparisons with overlap of only 4 genes between enzyme-induced genes and Cst7-dependent genes in males and no overlap between enzyme-induced genes and Cst7-dependent genes in females. We would further point out that the disease-induced microglial RNAseq profile in the AppNL-G-F Cst7+/+ (i.e. disease WT) condition mirrors those observed previously by multiple methods including in situ profiling (Zeng et al 2023 - PMID: 36732642) and RiboTag approaches (Kang et al 2018 - PMID: 30082275). We believe these combined approaches provide convincing validation of the RNAseq data.

      In assessing the changes in microglial function and A pathology that occur in males and females of the Cst7-/-; AppNL-G-F line, the authors identify some differences between how females and males are affected by the loss of Cst7. While the statistical analyses the authors perform as given in the figure legends appear to be correct, the plots do not show significant changes between males and females for a given parameter. Take for example Figure 3H. Loss of Cst7 decreases IBA+Lamp+ microglia in males but increases this parameter in females. However, it does not appear that there is a significant difference in IBA+Lamp+ microglia in male versus female mice lacking Cst7. If there is no absolute difference between males and females, can the differential effects of Cst7 knockout on the sexes really be so relevant to the sexual dimorphism observed in the disease? I question this connection, but perhaps a greater discussion of what the result might mean by the authors would be helpful for placing this into context.

      We understand the reviewer’s perspective and we agree that the interpretations could be presented and explained better in the text - we have updated the discussion as suggested to address this.

      We designed our study initially to search for sex-specific effects of Cst7. Therefore, whilst our ANOVA does include main effects analysis for disease or sex, we carried out post-hoc analysis primarily to investigate effects of Cst7 deletion within sex. In the case of Fig. 3H pointed out by the reviewer, we observe a main effect for disease in the ANOVA and for disease-sex interaction but not for sex. Post-hoc analysis revealed the sex-specific effects of Cst7 we describe in the manuscript. This approach on analysis was also taken by Hoghooghi et al. (2020 - PMID: 33027652) who show related pathway gene Cstc is detrimental in EAE in females but not males (included in the discussion in this manuscript). The observation in Fig. 3H that there appears to be a Cst7 effect in males and females but not a sex effect in Cst7-/- is accurate but a relative anomaly in this study. Generally, we find that, alongside Cst7 deletion affecting females differently to males, we also see a sex effect in Cst7-/- animals but not in Cst7+/+ animals i.e. absolute levels in disease condition as well as relative changes from control to disease condition are different between males and females. This is exemplified in Fig. 4B&C where we observe increased microglial Aβ in female Cst7-/- animals vs male Cst7-/- animals and in Fig. 6D where we observe increased Aβ plaque burden in female Cst7-/- animals vs male Cst7-/- animals. This is most strikingly demonstrated in the case of our RNASeq data where we observe a difference in sex-dependent genes in AppNL-G-F vs AppNL-G-F/Cst7-/- (Fig. 2 – Fig. supplement 3B) implying removal of the Cst7 gene led to an ‘unlocking’ of sexual dimorphism in our cohort which we comment on in the discussion.

      Finally, the use of in vitro assays of microglial function can be helpful as secondary analyses when coupled with in vivo or ex vivo approaches, but are not on their own sufficient to support the authors' conclusions. Quantitative engulfment assays (see Schafer et al, Neuron, 2012) on brain tissue showing that male and female microglia lacking Cst7 engulf different amounts of material (e.g. plaques, synapses, myelin) in the intact brain would be more convincing.

      We agree that in vitro assays for microglial function are not always sufficient as standalone methods to support conclusions on functions in disease. The reviewer may have missed our in vivo MeX04 uptake assays (Fig 4A-D) which use measurements by flow cytometry on isolated microglia, this is a reflection of the microglial uptake in vivo following MeX04 injection pre-mortem – this experiment showed increased microglial Aβ in female Cst7-/- animals vs male Cst7-/- animals (Fig. 4B&C). Our in vitro assays complement and extend insight in ways not possible in vivo, for example they offer key insight into uptake/degradation kinetics that would be extremely challenging to carry out in vivo.

      In general, a major limitation to the insights that can be derived in the study is the decision of the authors to perform all experiments at a single late-stage time point of 12 months of age. As this is quite far into disease progression for many AD models, phenotypic changes identified by the authors could arise due to the downstream effects of plaque deposition and therefore may not implicate Cst7 as a mechanism driving neurodegeneration rather than one of many inflammatory changes that accompany AD mouse models nearing the one-year time point. A related problem is that the study uses a constitutive KO mouse that has lacked Cst7 expression throughout life, not just during disease processes that increase with aging. In summary, the topic of the article is important and timely, but the connection between the data and the authors' conclusions is not as strong as it could be.

      As described above, Cst7 expression is absent at steady-state and low until 6-12 months. Therefore, we predict that deletion would have little effect until 12+ months whereby cells expressing Cst7 have had the temporal window to affect disease pathology, as we find in the current study. This was a key part of the reasoning in our choice of the 12-month age for analyses. The negligible expression of Cst7 at baseline/early stages of disease suggests constitutive KO of the gene will not impact the phenotype until disease onset. This is substantiated by the lack of any genotype-related differences in the WT vs Cst7-/- comparisons in the non-disease condition.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper presents an interesting data set from historic Western Eurasia and North Africa. Overall, I commend the authors for presenting a comprehensive paper that focuses the data analysis of a large project on the major points, and that is easy to follow and well-written. Thus, I have no major comments on how the data was generated, or is presented. Paradoxically, historical periods are undersampled for ancient DNA, and so I think this data will be useful. The presentation is clever in that it focuses on a few interesting cases that highlight the breadth of the data.

      The analysis is likewise innovative, with a focus on detecting "outliers" that are atypical for the genetic context where they were found. This is mainly achieved by using PCA and qpAdm, established tools, in a novel way. Here I do have some concerns about technical aspects, where I think some additional work could greatly strengthen the major claims made, and lay out if and how the analysis framework presented here could be applied in other work.

      clustering analysis

      I have trouble following what exactly is going on here (particularly since the cited Fernandes et al. paper is also very ambiguous about what exactly is done, and doesn't provide a validation of this method). My understanding is the following: the goal is to test whether a pair of individuals (lets call them I1 and I2) are indistinguishable from each other, when we compare them to a set of reference populations. Formally, this is done by testing whether all statistics of the form F4(Ref_i, Ref_j; I1, I2) = 0, i.e. the difference between I1 and I2 is orthogonal to the space of reference populations, or that you test whether I1 and I2 project to the same point in the space of reference populations (which should be a subset of the PCA-space). Is this true? If so, I think it could be very helpful if you added a technical description of what precisely is done, and some validation on how well this framework works.

      We agree that the previous description of our workflow was lacking, and have substantially improved the description of the entire pipeline (Methods, section “Modeling ancestry and identifying outliers using qpAdm”), making it clearer and more descriptive. To further improve clarity, we have also unified our use of methodology and replaced all mentions of “qpWave” with “qpAdm”. In the reworked Methods section mentioned above, we added a discussion on how these tests are equivalent in certain settings, and describe which test we are exactly doing for our pairwise individual comparisons, as well as for all other qpAdm tests downstream of cluster discovery. In addition, we now include an additional appendix document (Appendix 4) which, for each region, shows the results from our individual-based qpAdm analysis and clustering in the form of heatmaps, in addition to showing the clusters projected into PC space.

      An independent concern is the transformation from p-values to distances. I am in particular worried about i) biases due to potentially different numbers of SNPs in different samples and ii) whether the resulting matrix is actually a sensible distance matrix (e.g. additive and satisfies the triangle inequality). To me, a summary that doesn't depend on data quality, like the F2-distance in the reference space (i.e. the sum of all F4-statistics, or an orthogonalized version thereof) would be easier to interpret. At the very least, it would be nice to show some intermediate results of this clustering step on at least a subset of the data, so that the reader can verify that the qpWave-statistics and their resulting p-values make sense.

      We agree that calling the matrix generated from p-values a “distance matrix” is a misnomer, as it does not satisfy the triangle inequality, for example. We still believe that our clustering generates sensible results, as UPGMA simply allows us to project a positive, symmetric matrix to a tree, which we can then use, given some cut-off, to define clusters. To make this distinction clear, we now refer to the resulting matrix as a “dissimilarity matrix” instead. As mentioned above, we now also include a supplementary figure for each region visualizing the clustering results.

      Regarding the concerns about p-values conflating both signal and power, we employ a stringent minimum SNP coverage filter for these analyses to avoid extremely-low coverage samples being separated out (min. SNPs covered: 100,000). In addition, we now show that cluster size and downstream outlier status do not depend on SNP coverage (Figure 2 - Suppl. 3).

      The methodological concerns lead me to some questions about the data analysis. For example, in Fig2, Supp 2, very commonly outliers lie right on top of a projected cluster. To my understanding, apart from using a different reference set, the approach using qpWave is equivalent to using a PCA-based clustering and so I would expect very high concordance between the approaches. One possibility could be that the differences are only visible on higher PCs, but since that data is not displayed, the reader is left wondering. I think it would be very helpful to present a more detailed analysis for some of these "surprising" clustering where the PCA disagrees with the clustering so that suspicions that e.g. low-coverage samples might be separated out more often could be laid to rest.

      To reduce the risk of artifactual clusters resulting from our pipeline, we devised a set of QC metrics (described in detail below) on the individuals and clusters we identified as outliers. Driven by these metrics, we implemented some changes to our outlier detection pipeline that we now describe in substantially more detail in the Methods (see comment above). Since the pipeline involves running many thousands of qpAdm analyses, it is difficult to manually check every step for all samples – instead, we focused our QC efforts on the outliers identified at the end of the pipeline. To assess outlier quality we used the following metrics, in addition to manual inspection:

      First, for an individual identified as an outlier at the end of the pipeline, we check its fraction of non-rejected hypotheses across all comparisons within a region. The rationale here is that by definition, an outlier shouldn’t cluster with many other samples within its region, so a majority of hypotheses should be rejected (corresponding to gray and yellow regions in the heatmaps, Appendix 4). Through our improvements to the pipeline, the fraction of non-rejected hypotheses was reduced from an average of 5.3% (median 1.1%) to an average of 3.8% (median 0.6%), while going from 107 to 111 outliers across all regions.

      Second, we wanted to make sure that outlier status was not affected by the inclusion of pre-historic individuals in our clustering step within regions. To represent majority ancestries that might have been present in a region in the past, we included Bronze and Copper Age individuals in the clustering analysis. We found that including these individuals in the pairwise analysis and clustering improved the clusters overall. However, to ensure that their inclusion did not bias the downstream identification of outliers, we also recalculated the clustering without these individuals. We inspected whether an individual identified as an outlier would be part of a majority cluster in the absence of Bronze and Copper Age individuals, which was not the case (see also the updated Methods section for more details on how we handle time periods within regions).

      In response to the “surprising” outliers based on the PCA visualizations in Figure 2, Supplement 2: with our updated outlier pipeline, some of these have disappeared, for example in Western and Northern Europe. However, in some regions the phenomenon remains. We are confident this isn’t a coverage effect, as we’ve compared the coverage between outliers and non-outliers across all clusters (see previous comment, Figure 2 - Suppl. 3), as well as specifically for “surprising” outliers compared to contemporary non-outliers – none of which showed any differences in the coverage distributions of “surprising” outliers (Author response images 1 and 2). In addition, we believe that the quality metrics we outline above were helpful in minimizing artifactual associations of samples with clusters, which could influence their downstream outlier status. As such, we think it is likely that the qpAdm analysis does detect a real difference between these sets of samples, even though they project close to each other in PCA space. This could be the result of an actual biological difference hidden from PCA by the differences in reference space (see also the reply to the following comment). Still, we cannot fully rule out the possibility of latent technical biases that we were not able to account for, so we do not claim the outlier pipeline is fully devoid of false positives. Nevertheless, we believe our pipeline is helpful in uncovering true, recent, long-range dispersers in a high-throughput and automated manner, which is necessary to glean this type of insight from hundreds of samples across a dozen different regions.

      Author response image 1.

      SNP coverage comparison between outliers and non-outliers in region-period pairings with “surprising” outliers (t-test p-value: 0.242).

      Author response image 2.

      PCA projection (left) and SNP coverage comparison (right) for “surprising” outliers and surrounding non-outliers in Italy_IRLA.

      One way the presentation could be improved would be to be more consistent in what a suitable reference data set is. The PCAs (Fig2, S1 and S2, and Fig6) argue that it makes most sense to present ancient data relative to present-day genetic variation, but the qpWave and qpAdm analysis compare the historic data to that of older populations. Granted, this is a common issue with ancient DNA papers, but the advantage of using a consistent reference data set is that the analyses become directly comparable, and the reader wouldn't have to wonder whether any discrepancies in the two ways of presenting the data are just due to the reference set.

      While it is true that some of the discrepancies are difficult to interpret, we believe that both views of the data are valuable and provide complementary insights. We considered three aspects in our decision to use both reference spaces: (1) conventions in the field (including making the results accessible to others), (2) interpretability, and (3) technical rigor.

      Projecting historical genomes into the present-day PCA space allows for a convenient visualization that is common in the field of ancient DNA and exhibits an established connection to geographic space that is easy to interpret. This is true especially for more recent ancient and historical genomes, as spatial population structure approaches that of present day. However, there are two challenges: (1) a two-dimensional representation of a fairly high-dimensional ancestry space necessarily incurs some amount of information loss and (2) we know that some axes of genetic variation are not well-represented by the present-day PCA space. This is evident, for example, by projecting our qpAdm reference populations into the present-day PCA, where some ancestries which we know to be quite differentiated project closely together (Author response image 3). Despite this limitation, we continue to use the PCA representation as it is well resolved for visualization and maximizes geographical correspondence across Eurasia.

      On the other hand, the qpAdm reference space (used in clustering and outlier detection) has higher resolution to distinguish ancestries by more comprehensively capturing the fairly high-dimensional space of different ancestries. This includes many ancestries that are not well resolved in the present-day PCA space, yet are relevant to our sample set, for example distinguishing Iranian Neolithic ancestry against ancestries from further into central and east Asia, as well as distinguishing between North African and Middle Eastern ancestries (Author response image 3).

      To investigate the differences between these two reference spaces, we chose pairwise outgroup-f3 statistics (to Mbuti) as a pairwise similarity metric representing the reference space of f-statistics and qpAdm in a way that’s minimally affected by population-specific drift. We related this similarity measure to the euclidean distance on the first two PCs between the same set of populations (Author response image 4). This analysis shows that while there is almost a linear correspondence between these pairwise measures for some populations, others comparisons fall off the diagonal in a manner consistent with PCA projection (Author response image 3), where samples are close together in PCA but not very similar according to outgroup-f3. Taken together, these analyses highlight the non-equivalence of the two reference spaces.

      In addition, we chose to base our analysis pipeline on the f-statistics framework to (1) afford us a more principled framework to disentangle ancestries among samples and clusters within and across regions (using 1-component vs. 2-component models of admixture), while (2) keeping a consistent, representative reference set for all analyses that were part of the primary pipeline. Meanwhile, we still use the present-day PCA space for interpretable visualization.

      Author response image 3.

      Projection of qpAdm reference population individuals into present-day PCA.

      Author response image 4.

      Comparison of pairwise PCA projection distance to outgroup-f3 similarity across all qpAdm reference population individuals. PCA projection distance was calculated as the euclidean distance on the first two principal components. Outgroup-f3 statistics were calculated relative to Mbuti, which is itself also a qpAdm reference population. Both panels show the same data, but each point is colored by either of the two reference populations involved in the pairwise comparison.

      PCA over time

      It is a very interesting observation that the Fst-vs distance curve does not appear to change after the bronze age. However, I wonder if the comparison of the PCA to the projection could be solidified. In particular, it is not obvious to me how to compare Fig 6 B and C, since the data in C is projected onto that in Fig B, and so we are viewing the historic samples in the context of the present-day ones. Thus, to me, this suggests that ancient samples are most closely related to the folks that contribute to present-day people that roughly live in the same geographic location, at least for the middle east, north Africa and the Baltics, the three regions where the projections are well resolved. Ideally, it would be nice to have independent PCAs (something F-stats based, or using probabilistic PCA or some other framework that allows for missingness). Alternatively, it could be helpful to quantify the similarity and projection error.

      The fact that historical period individuals are “most closely related to the folks that contribute to present-day people that roughly live in the same geographic location” is exactly the point we were hoping to make with Figures 6 B and C. We do realize, however, that the fact that one set of samples is projected into the PC space established by the other may suggest that this is an obvious result. To make it more clear that it is not, we added an additional panel to Figure 6, which shows pre-historical samples projected into the present-day PC space. This figure shows that pre-historical individuals project all across the PCA space and often outside of present-day diversity, with degraded correlation of geographic location and projection location (see also Author response image 5). This illustrates the contrast we were hoping to communicate, where projection locations of historical individuals start to “settle” close to present-day individuals from similar geographic locations, especially in contrast with pre-historic individuals.

      Author response image 5.

      Comparing geographic distance to PCA distance between pairs of historical and pre-historical individuals matched by geographic space. For each historical period individual we selected the closest pre-historical individual by geographic distance in an effort to match the distributions of pairwise geographic distance across the two time periods (left). For these distributions of individuals matched by geographic distance, we then queried the euclidean distance between their projection locations in the first two principal components (right).

    1. Author Response

      Reviewer #1 (Public Review):

      “The authors use hM4Di to "silence" Fos-tagged neurons in the basal forebrain, but they have not validated the efficiency or the possible various effects of this reagent.

      It is possible that hM4Di actually has a relatively small effect on suppressing the AP activity of neurons. Nevertheless, hM4Di might still be an effective manipulation, because it was shown to additionally reduce transmitter release at the nerve terminal (see e.g. Stachniak et al. (Sternson) 2014, Neuron). Thus, the authors should evaluate in control experiments whether hM4Di expression plus CNO actually electrically silences the AP-firing of ChAT neurons in the BF as they seem to suggest, and/or if it reduces ACh release at the terminals. For example, one experiment to test the latter would be to perfuse CNO locally in the BLA; after expressing hM4Di in the cholinergic neurons of the BF. At the very least, the assumed action of hM4Di, and the possible caveats in the interpretation of these results should be discussed in the paper.”

      We find that activation of hM4Di with clozapine in basal forebrain cholinergic neurons results in clear alterations to neuronal activation in projection targets and in behavior (Figures 3, Figure 3-Supplement 1, Figure 5, Figure 5-Supplement 1, Figure 5-Supplement 2, Figure 6-Supplement 1 and Figure 8). Previous studies demonstrated that activation of hM3Dq or hM4di in cholinergic neurons results in changes to electrical activity and behavioral response (Zhang et al. 2017 & Jin et al. 2019). Though we are unable to distinguish whether the effects on behavior in our experiments are a result of decreases in ACh release at terminals, inhibition of action potential firing, or both, our behavioral findings are consistent with demonstrations that inhibition of basal forebrain cholinergic neurons can alter behavior. See Page 17 Lines 488-493 for a discussion.

      “The names of brain areas like "NBM/SIp" and "VP-SIa" need to be better introduced, and somehow contextualized (in the Introduction, and also at first reading in the Results).”

      We agree that our prior presentation of these regions was confusing and in general the boundaries of these regions are not well-defined in the field. We have included a description of anatomical landmarks and bregma coordinates to clarify our definitions of the regions NBM/SIp (Page 4 Line 103-104) and VP/SIa (Page 4 Line 107-108).

      “Figure 3C: Application of CNO on the memory recall day leads to a strong reduction in CS-driven freezing. However, in this experiment, and also in Fig. S7, the pre-tone value of freezing is also strongly reduced. This would indicate that the activity of NBM/SIp cells (or else, ACh-release from these cells - see also Major point 1), also influences contextual learning. The authors should, first, statistically, test these effects (I am not sure this was done). If these differences are significant, a possible role of ACh in contextual fear learning should be discussed. Has it been shown before whether ACh is involved in contextual fear learning? Does this indicate the involvement of another target area of ACh neurons (e.g., the hippocampus?).”

      We statistically compared the pre-tone freezing response between Sham and hM4Di groups across our experiments and found no significant differences in pre-tone freezing between the groups (Figure 3D- Sham vs. ADCD-hM4Di, Pre-tone p=0.3544; Figure 5B- Sham vs. hM4di, Pre-tone p=0.0679; Figure 5C- Sham vs. hM4Di, Pre-tone p=0.0966; Figure 5-Supplement 2A- Sham vs. hM4Di, Pre-tone p>0.99). These comparisons can also be reviewed in the statistical reporting table uploaded along with the manuscript.

      “The discussion could be improved by better comparing what they found, to the wider literature. For example, previous papers studying other neuromodulatory systems found evidence for a modulation of neuromodulator release after learning, e.g. see Martins and Froemke 2015 Nat. Neuroscience for the noradrenergic system, Tang et al. (Schneggenburger lab) 2020 J. Neuroscience for the dopaminergic system and fear learning; and Uematsu et al., 2017, Nat. Neuroscience for the noradrenergic system and fear learning. Maybe the authors could include these and similar references when revising their discussion to take into account a broader view of previous findings related to other neuromodulatory systems.”

      Our study joins the growing body of literature demonstrating stimulus-encoding and rapid stimulus-contingent responses in various neuromodulatory systems in learning and memory recall. We have now added a substantial discussion, detailing both the similarities and differences between our findings and those found in the dopaminergic, serotonergic, noradrenergic, and oxytocinergic systems in fear learning. See Pages 20-21 Lines 575-605.

      Reviewer 2 (Public Review):

      “Throughout the paper, the authors use comparisons of cell activity between groups to address questions about projection-specific and cue-specific cell activation and reactivation. However, statistical comparisons are sometimes done between biological replicates (e.g. Fig. 5A), whereas a lot of them are done between technical replicates (e.g. Fig. 2B, 5B, 7B). Adding statistics that compare biological replicates would help increase confidence in the results.”

      We have replotted our data as a comparison of biological replicate (by individual animal) in new versions of Figures 1-8, and Figure 1-Supplements 1-3, Figure 5-Supplements 1 & 2, Figure 6-Supplements 1 & 2, Figure 7-Supplement 1, and Figure 8-Supplement 1. Correspondingly, all statistical analyses have been conducted comparing biological replicates. To note, these changes have not changed the overall conclusions of each figure. The sample size, statistical test and p-values for our comparisons are included in the figure legends and in the newly included statistical reporting table.

      "To demonstrate engram-like specificity, in figure 4C the authors show fold change in cholinergic reactivation in low and high responders (animals that show low and high defensive freezing upon cue presentation) as normalized by cell activity while sitting in the home cage. However, the authors also collected a better control for this comparison, which is shown in figure S4, where the animals were exposed to an unconditioned tone cue. Comparing fold change to this tone-alone condition would provide stronger evidence for the authors' point, as this would directly compare the specificity of cholinergic reactivation to a conditioned vs an unconditioned cue. A discussion of the same comparison is relevant for figure 2 (and is shown in figure S4) but is not mentioned in the text.”

      We have evaluated the cholinergic response to the tone using GRABACh3.0 as a readout of ACh release in the BLA, and using IEG expression as a readout of cholinergic neuron activation. We find no significant increase in ACh release in the BLA in response to tone presentation (Figure 1C-left, 1D-left) and no significant increase in tone associated reactivation of cholinergic neurons (using IEG as a readout, 2C/D, Figure 1-Supplement 2, Figure 1-Supplement 3, Figure 6-Supplement 1A) unless the tone has been previously paired with a foot shock(see Figure 1C-right, 2C, 3D). In addition, we find no statistically significant differences between home cage and tone alone conditions (Figure 2C – home cage-home cage condition vs. tone-tone condition, p=0.5012; Based on these analyses, we use the home cage group as our control group for comparison.

      “The significant correlation between cue-evoked percent change in defensive freezing from pretone and fold change in cholinergic cell activity relative to the home cage that is shown in figure 4D is somewhat confusing. Is the correlation considering all the points shown (high and low responders as depicted by black and grey points)? It's first reported as one correlation but then is discussed as two populations that have different results. Further, is the average amount of reactivation for the home-cage controls used here the same denominator for each reported animal? Similarly to the point above, a correlation looking at fold change from tonealone would also be helpful to determine the degree to which cholinergic reactivation is specific to threat-association learning versus the more general attentional component that this system is known for.”

      We have substantially modified this figure, now new Figure 6, to clarify our point. Along with this revision, we have removed the correlation plots and corresponding analyses from the revised version of the manuscript and figures.

      Figure 6 now begins with behavior data from a distinct cohort of mice outlining our criteria for high vs. low responders (Figure 6A/B). In Figure 6C, conducted in a separate cohort of mice that only underwent behavioral testing to clarify the definition of high vs. low responders, we note via schematic that ADCD labeling was carried out during the recall session (unlike Figure 2). In panel D, we show fold change of activated cholinergic neurons stratified by High vs. Low responder status. This fold change is normalized to the average activation from the home cage control animals in each experimental cohort. Taken together we find animals with a ~2 fold increase in activation of cholinergic neurons display significant, distinguishable freezing in response to the tone as compared to pretone freezing. We find that this cluster of activated neurons is segregated to the anterior NBM/SIp (Figure 6E).

      Regarding the involvement of cholinergic reactivation tone response (attention) rather than learning - in Figure 1-Supplement 3, we evaluate ACh release and behavioral response in mice that were exposed to three shocks alone (no tone) on day 1 and then exposed to a single (novel) tone on day 2. In these mice we find no significant change in ACh release in the BLA in response to tone, and no significant increase in freezing behavior in response to the tone. In Figure 2D, we evaluate reactivation of cholinergic neurons in a similar context and find that this group does not significantly differ from the home cage → home cage group. Further, we present that this home cage group does not significantly differ from Low Responders. As such, we find significant reactivation of cholinergic neurons in animals with increased responsiveness to the CS tone during the recall session (High Responders).

      “The compelling argument of this paper is that the authors are separating out the general attention role typically attributed to the cholinergic system from a more specific, engram-based role. Given the importance of untangling this, it would useful to see the recorded traces and behavioral scoring for the data shown in figure S2B. For example, was the higher slope in the recorded cholinergic response during unconditioned tone 1 also accompanied by an increase in freezing, which later went away with additional non-reinforced tones? Given that the animals were not habituated to tones (according to the Methods), this activity could be related to a habituation/general attention response, which may then be weaker than the learned response.”

      We include individual traces of GRABACh3.0 release in the BLA in response to the unconditioned tone from a protocol with 3x tone presentation on Day 1 and tone presentation on Day 2 (Figure 1-Supplement 2C). We have also included average + SEM traces for the entire duration of the tone presentation for the three unconditioned tones in this paradigm along with an inset showing 1s before and after tone onset (Figure 1Supplement 2D). Finally, we include individual traces of GRABACh3.0 release in the BLA in response to the first (naïve) tone from mice that underwent the training (tone + shock) followed by recall (tone) paradigm in Figure 1-Supplement 4C, left. None of the unconditioned tone responses were statistically significantly different from the preceding baseline. Instead, we find the learned response is significantly higher than the response baseline (Figure 1D).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors used MD simulations to investigate the role of N-terminal myristoylation and the presence of two SH domains on the allosteric regulation of c-Abl kinase. Standard established MD simulation methods and analyses were applied, including the force distribution analysis (FDA) method developed by Grater et al. some time ago.

      The system is large and the conformational changes are complicated. In light of this, and aggravated by the fact that direct comparison with - and critical testing against - experimental data is not possible in the present case, I consider the overall simulation times to be rather short (several repeats, but only 500 ns). So there might be statistical convergence issues. Especially also because at least some of the starting structures were generated from available experimental structures after some modifications/modelling, and they might thus be out of equilibrium and need some time to fully relax during the MD simulations.

      Unfortunately, I cannot find any convergence tests concerning the length of the simulations, which are usually considered to be standard analyses (Appendix Fig. 5 shows the effect of different thermostats and capping of the peptide chain, but no tests concerning simulation time). This could be critical in the present case, where the authors acknowledge themselves (e.g., on p. 4) that there are only subtle differences between the different simulation systems and the variations within a given system are larger than the relevant (putative) differences between systems (Fig. 1 C, D, E).

      We thank the reviewer for taking the time and critically assessing our manuscript. We appreciate and have addressed the raised concerns as follows. We have quadrupled the simulation time to 2 µs for 20 out of the 30 replicates and show the updated results for these. We refer the reviewer to the modified Fig. 2 and 3 (former Fig. 1 and 2) with the updated data. Our main conclusions remained unchanged, namely that Myr unbinding shifts the overall kinase domain dynamics towards an active state. We furthermore still observe allosteric signal propagation from the Myr binding site to the active site along the alpha_F helix and a collaborative effect of Myr and the SH domains. Only some minor points were not confirmed after analyzing the longer simulations, for example the force differences transmitted to the A-loop upon SH domain binding/unbinding (former Fig. 2D), and changes in amplitude of N- and C-lobe opening upon Myr unbinding (former Fig. 1E). Furthermore, to demonstrate convergence, we added block and autocorrelation analyses for Fig. 1 (now Fig. 2) to Fig. 2 – fig supplement 3, and observed good convergence across all systems. Finally, we also increased simulation times of the umbrella sampling from 50ns to 200ns, again without that the quantitative trends and our conclusions have changed (see also next point).

      Issues with statistical convergence are expected not only for the standard MD simulations but also for the umbrella sampling simulations, as 50 ns sampling per window is nowadays not considered state of the art and is likely insufficient for quantitative binding free energy calculation, especially for membranes (see, e.g., DOI 10.1021/ct200316w). However, worrying about this latter aspect might neither be useful nor needed, because in our view the statement that myristoyl groups can bind to the membrane and that they can compete with binding in the hydrophobic protein pocket can hardly be considered a surprise and would not have required any simulation at all in my view because the experimental K_D values are available (Table 1). The very unfavourable K_d values for unbinding of Myr from both the hydrophobic protein pocket as well as from the membrane in fact show that this is not how it is expected to work in reality. The fully solvated state will be avoided due to its high free energy. Instead, isn't the myristoyl expected to directly transition from the pocket into the membrane, after membrane binding of the kinase in a proper orientation?

      The experimental values were determined with different methods, i.e. estimated from zeta potential measurements in case of the membrane and calorimetry, which only considered the kinase domain instead of the SH3-SH2-kinase complex, in case of Abl. We thus found it appropriate to perform Umbrella Sampling simulations to ensure comparability. Additionally, these allowed us to study the effects of different alpha_I helix conformations, which had a significant impact on the free energy of Myr unbinding, precisely Abl with a partially unfolded helix reflected the experimental energy better than the crystal structure with a kinked helix. We highlight this more explicitly in the corresponding Discussion section. Regarding the simulation time per sampling window, we did a block analysis (Fig. 5 – fig supplement 1) as suggested in the cited reference and also extended the time of each sampling window from 50 ns to 200 ns. This did not significantly alter the results and, importantly, the relative differences between Abl and the membrane stayed the same and are in good agreement with the experimental values.

      Concerning the metadynamics simulations, these are usually done to obtain a free energy landscape. Why was this not attempted here? In the present case, the authors seemed to have used metadynamics only for generating starting structures, with different degrees of helicity of the alpha_I part, for subsequent standard MD simulations. Not surprisingly, nothing much happened during the latter, and conformers with kinked/partially unfolded alpha_I as well as conformers with straight alpha_I were both found to be "stable", at least on the short simulation time scale. It could also not be expected that the SH domain would spontaneously detach in response to helix straightening - again, this would require much longer simulation times than 500 ns. Nevertheless, alpha_I straightening might very well reduce the binding affinity towards SH - this can only be explicitly studied with free energy simulations, however.

      Our main goal was indeed to achieve different alpha_I helix conformations for subsequent Umbrella Sampling simulations, and found that helix formation is in principle possible without SH2 domain unbinding. We would like to emphasize the impact of the different helix conformations on the free energy of Myr unbinding, which further highlights the need to investigate these structures. We chose Metadynamics to obtain them because it only facilitates the transition away from the kinked conformation without biasing towards certain end structures or transition pathways, which we found advantageous compared to alternative methods such as targeted MD. The reason for not reporting a free energy surface is that we considered the helicity of all seven residues making up the kink within a single CV, which smeared the energy landscape to the point that it is almost completely flattened. Furthermore, orthogonal CVs such as new interactions between the alpha_I helix with the SH2 domain or positional adjustments of the SH2 domain would have to be considered for a reliable quantitative result. We nevertheless observed transient SH2 domain unbinding during the applied time scale and added histograms to Fig. 4 – fig supplement 1 (former appendix Fig. 4) to make this more obvious.

      Reviewer #2 (Public Review):

      The manuscript aims at understanding how the fatty acid ligand MYR inhibits the activity of Abl kinase. Despite a wealth of structural and biochemical data, a key mechanistic understanding of how MYR binding could inactive Abl was missing.

      The authors used equilibrium and enhanced molecular dynamics (MD) simulations to masterfully answer open questions left by extensive experimental data in the mechanistic understanding of this system. The authors took advantage of several state-of-the-art simulation techniques and carefully planned simulations to extract a coherent understanding from a wealth of experimental facts.

      The manuscript convincingly identifies an allosteric regulation by MYR. Allostery is often a source of confusion and sometimes is used as a magic catch-it-all explanation for poorly understood phenomena. Here, the authors show very compelling evidence of the existence of an allosteric mechanism. Also, they identify the physical origin of the allosteric pathway, providing a clear mechanistic understanding at the residue-level resolution. This is an impressive achievement.

      We thank the reviewer for appreciating our work and its significance for understanding Abl regulation.

      By leaving a pocket in the protein, MYR enables the protein's activation. But MYR is a highly hydrophobic molecule surrounded by water. Where could it go rather than quickly binding back to the protein pocket? By asking this reasonable question, the authors propose an exciting mechanistic hypothesis. The physical proximity of Abl kinase to a cellular membrane could lead to a competition between the protein and the membrane for MYR, leading to a novel layer of regulation for this kinase. Free energy calculations performed by the authors show that this hypothesis is reasonable from the thermodynamic point of view.

      From a broader perspective, this manuscript is an important contribution to the discussion of four outstanding topics. 1) myristoylation is an example of lipidation, a post-translational modification where an acyl chain is covalently linked to a protein. The role of post-translational modifications has been greatly underappreciated and investigated in the MD community. However, as all the work on Sars-Cov2 and this contribution show, post-translational modifications can be crucial to understanding function. Ignoring them could lead to severely biased results. 2) the debate on the nature of allostery is still on the rage. Some authors claim that looking for a residue-level mechanistic chain of events that explains the allosteric action does not make sense and that the only way of thinking about allostery is as a sudden global change of the conformational landscape. Here, the authors show that instead, it is possible and leads to an essential understanding. 3) The authors hypothesize a novel crosstalk between the Abl and cellular membranes mediated by MYR. This exciting and far-reaching hypothesis opens the door to new complex layers of regulation. I suspect that these crosstalks between cytosolic proteins, or the soluble domain of membrane-tethered proteins and membranes, are much more ubiquitous than what has been appreciated so far. 4) From a methodological point of view, this manuscript represents a masterful use of simulations to put existing experimental data in a coherent picture. It is an example of the use of MD simulations at its best, where the simulations make sense of experiments, integrate existing data into a unified picture, and lead to new hypotheses that can be tested in future experiments.

      We thoroughly appreciate the reviewers positive feedback and the valuable suggestions for improvement below.

      It would be superb if the authors could propose precise predictions that could inspire future experiments. Now that they present a residue-resolution allosteric pathway, can they suggest point mutations that would interrupt it?

      We have added a short segment to the end of the discussion proposing possible experiments.

    1. Author Respones

      Reviewer #1 (Public Review):

      The manuscript by Hekselman et al presents analyses linking cell-types to monogenic disorders using over-expression of monogenic disease genes as the signal. The manuscript analyses data from 6 tissues (bone marrow, lung, muscle, spleen, tongue and trachea) together with ~1,000 rare diseases from OMIM (with ~2,000 associated genes) to identify cell-type of interest for specific disease of choice. The signal used by the approach is the relative expression of OMIM-genes in a particular cell type relative to the expression of the gene in the tissue of interest identifying celltype-disease pairs that are then investigated through literature review and recapitulated using mouse expression. A potentially interesting finding is that disease genes manifesting in multiple tissues seem to hit same cell-types. Overall this important study combines multiple data analyses to quantify the connection between cell types and human disorders. However whereas some of the analyses are compelling, the statistical analyses are incomplete as they don't provide full treatment of type I error.

      Statistical analyses were changed to include permutation testing and a different threshold (Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1–figure supplement 2). Assessments of type I error were based on literature text-mining and expert curation, and showed that false-positive rates were low in both (0.01 and 0.07, respectively; Figure 1F and Figure 1–figure supplement 4A).

      Reviewer #2 (Public Review):

      This study identifies 110 disease-affected cell types for 714 Mendelian diseases, based on preferential expression of known disease-associated genes in single-cell data. It is likely that many or most of the results are real, and the results are biologically interesting and provide a valuable resource. However, updates to the method are needed to ensure that inference of statistical significance is appropriately stringent and rigorous.

      Strengths: a systematic evaluation of disease-affected cell types across Mendelian diseases is a valuable addition to the literature, complementing systematic evaluations of common disease and targeted analyses of individual Mendelian diseases. The validation via excess overlap with diseasecell type pairs from literature co-appearance provides compelling evidence that many or most of the results are real. In addition, many of the results are biologically interesting. In particular, it is interesting that diseases with multiple affected tissues tend to affect similar cell types in the respective tissues.

      Limitations: the main limitation of the study is that, although many or most of the results are likely to be real, the criteria for statistical significance is probably not stringent enough, and is not welljustified. For diseases with only 1 disease-associated gene, the threshold is a z-score>2 for preferential expression in the cell type, but this threshold is likely to be often exceeded by chance. (For diseases with many disease-associated genes, the threshold is a median (across genes) zscore>2 for preferential expression in the cell type, which is less likely to occur by chance but still an arbitrary threshold.) Thus, there is a good chance that a sizable proportion of the reported disease-affected cell types might be false positives. The best solution would be to assess statistical significance via empirical comparison with results for non-disease-associated control genes, and assess the statistical significance of the resulting P-values using FDR.

      We thank the reviewer for the valuable insights and suggestions. We revised the method to assess statistical significance by using empirical comparison followed by FDR correction, as suggested by the reviewer (Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1–figure supplement 2).

      The re-analysis using mouse single-cell data adds an interesting additional dimension to the study, with the small caveat that mouse single-cell data does not provide statistically independent information across genes (for the same reason that adding data from independent human individuals would not provide statistically independent information across genes, given that human and mouse expression are partially correlated).

      We acknowledge this caveat in the text (Discussion, page 17, 2nd paragraph, lines 8-11).

      Reviewer #3 (Public Review):

      The authors describe the method, PrEDiCT, which helps identify disease affected cell types based on gene sets. As I understand it, the method is based on finding which "disease genes" (from an annotation) are relatively highly expressed. The idea is nice, however, I have concerns about how "significance" is assessed and the relative controls.

      Overall, I find the idea interesting, but the execution raises some concerns.

      1) From a causal perspective, there is an association of high expression of these genes within these cell types, but without also assessing individuals with those specific diseases, I do not it is fair to say "disease affected" cell types. It is possible that these genes might behave completely fine but are highly expressed in those cell types while being affected another in other cell types.

      We agree with the reviewer. We changed the terminology to "likely disease-affected cell types” and added this caveat to the Discussion, page 16, 2nd paragraph.

      2) It is unclear to me what the "null" comparison is in the method and if there is one. For example, by chance, would I expect this gene to be highly expressed because other genes are also highly expressed in this cell type? Some way to assess "significance" or "enrichment" beyond simply using ranks and thresholds would be helpful in deciding whether these associations are robust.

      We revised the procedure for assessing statistical significance to include permutation tests. Specifically, given a disease D with n disease-associated genes, the null hypothesis was that the PrEDiCT score of these genes is not significantly different from the PrEDiCT score of a random set of n genes. To test this, we randomly selected n genes expressed in any cell type, and computed the PrEDiCT score for this random gene set in each cell type of the disease-affected tissue (referred to as ‘random score’). We repeated this procedure 1,000 times, resulting in 1,000 random scores per disease and cell type. The p-value of the PrEDiCT score of disease D in cell type c was set to the fraction of random scores in c that were at least as high as the original PrEDiCT score of D in c. The acquired p-values were adjusted for multiple hypothesis testing per disease using the Benjamini-Hochberg procedure. To increase stringency, we treated only statistically significant disease–cell-type pairs with PrEDiCT score≥1 as 'likely affected'. The procedure is detailed in Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1–figure supplement 2. Additionally, we estimated type I error by using literature text-mining or expert curation (Results, page 7, 2nd paragraph; Methods, page 22, ‘Textmining of PubMed records’, and page 23, ‘Expert curation and assessment of disease-affected cell types’; Figure 1F and Figure 1–figure supplement 4A).

      3) Additionally, it is unclear to me, but I suspect that there are unequal cell numbers in the scores computed as well as between relevant tissues. This is related to point (2) above, but as a result, the estimates of the scores will inherently have different variances, thus making comparisons between them difficult/unreliable unless accounted for. If I understand correctly, the score is first the average expression within a tissue, then, the Z-score? If so, my comment applies.

      To clarify, the PrEDiCT score of a disease D in cell type c was set to the median preferential expression P of its disease genes (Equation 1 below). The preferential expression of each gene in c was computed as a Z-score, by comparing the average expression of the gene in c to its average expression in all cell types of the tissue, divided by the standard deviation (SD, Equation 2 below). Tissues indeed had unequal numbers of cell types, however, the distribution of PrEDiCT scores were similar between tissues (now in Supplementary File 13). We revised this part of Methods and added Equations 1 and 2 (Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’) and Supplementary File 13.

      4) There is a large set of work done in gene enrichment sets which appears to not be mentioned (e.g. GSEA and other works by the Price group). It would be helpful for the authors to summarize these methods and how their method differs.

      We added work done in gene enrichment sets (including two relevant and recent studies from the Price group) and summarized these methods in the Introduction (page 2-3).

      5) Additionally, it should be noted that a caveat of this analysis is that the comparisons are all done only relative to the cell types sampled and the diseases which have Mendelian genes associated with them. I would expect these results to change, possibly drastically, if the sampled cell types and diseases were to be changed.

      We agree with the reviewer and now discuss the generalizability of our results, relating to the extent of the sampled cell types (Discussion, page 18, 1st paragraph).

      6) Finally, I would appreciate a more detailed explanation in the methods of how the score is computed. Some equations and the data they are calculated from would be helpful here.

      We now provide a detailed explanation of how the score and its statistical significance were computed and added Equations 1 and 2 (Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’).

      In summary, the general idea is an interesting one, but I do think the issues above should be addressed to make the results convincing.

      We thank the reviewer for the important feedback which helped us strengthen our analyses.

    1. Author Response

      Thank you for providing us with the reviewer comments. We will provide the revised manuscript at a later stage as recommended.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors used machine learning algorithm to analyze published exosome datasets to find biomarkers to differentiate exosomes of different origin.

      Strengths:

      The performance of the algorithm are generally of good quality.

      Weaknesses:

      The source datasets are heterogeneous as described in Figure 1 and Figure 2, or Line 72-75; and therefore questionable.

      We thank the reviewer for this assessment. The commonly used biomarkers of exosomes exhibit heterogeneous presence and abundance within the exosomes derived from different cell lines, tissue, and biological fluids. The primary goal of this study was to identify universal exosomal biomarkers that remain consistent across different sources of exosomes, unaffected by potential isolation and quantification bias. This objective was achieved through an integration of datasets from different sources, which allowed for the subsequent identification of common proteins associated with exosomes. Among the 18 protein markers identified, it is noteworthy that they are universally abundant in all cell lines and their exosomes. We believe that despite the heterogeneity of the datasets used here, the identification of 18 universal protein markers in exosomes from diverse sources is a strength of this analysis.

      Reviewer #2 (Public Review):

      Summary:

      This is a fine work on the development of computational approaches to detect cancer through exosomes. Exosomes are an emerging biomarker resource and have attracted considerable interests in the biomedical field. Kalluri and co-workers collected a large sample pool and used random forest to identify a group of protein markers that are universal to exosomes and to cancer exosomes. The results are very exciting and not only added new knowledge in cancer research but also a new and advanced method to detect cancer. Data was presented very nicely and the manuscript was well written.

      Strengths:

      Identified new biomarkers for cancer diagnosis via exosomes.

      Developed a new method to detect cancer non-invasively.

      Results were presented nicely and manuscript were well written.

      Weaknesses:

      N/A.

      We appreciate the the enthusiastic assessment of our study by the reviewer.

      Reviewer #3 (Public Review):

      In the current study, Li et al. address the difficulty in early non-invasive cancer diagnosis due to the limitations of current diagnostic methods in terms of sensitivity and specificity. The study brings attention to exosomes - membrane-bound nanovesicles secreted by cells, containing DNA, RNA, and proteins reflective of their originating cells. Given the prevalence of exosomes in various biological fluids, they offer potential as reliable biomarkers. Notably, the manuscript introduces a new computational approach, rooted in machine learning, to differentiate cancers by analyzing a set of proteins associated with exosomes. Utilizing exosome protein datasets from diverse sources, including cell lines, tissues, and various biological fluids, the study spotlights five proteins as predominant universal exosome biomarkers. Furthermore, it delineates three distinct panels of proteins that can discern cancer exosomes from non-cancerous ones and assist in cancer subtype classification using random forest models. Impressively, the models based on proteins from plasma, serum, or urine exosomes achieve AUROC scores above 0.91, outperforming other algorithms such as Support Vector Machine, K Nearest Neighbor Classifier, and Gaussian Naive Bayes. Overall, the study presents a promising protein biomarker signature tied to cancer exosomes and proposes a machine learning-driven diagnostic method that could potentially revolutionize non-invasive cancer diagnosis.

      We appreciate this positive assessment of our work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study by O'Reilly and Delis provides a valuable data-driven framework for extracting task-related muscle synergies in a step towards the understanding and practical use of synergies in real scenarios (e.g., evaluation of patients in a clinical environment). The approach is incomplete since the authors did not compare their method with classical physiologically grounded approaches for assessing muscle synergies. In this sense, the comparisons with classical approaches would clarify if physiological assemblies were preserved and were not altered to incorporate task space variables. Despite limitations, the proposed framework would interest motor control and neural engineering researchers.

      We thank the editors for the positive assessment of our work and appreciate their constructive feedback. In our revised manuscript, we believe we have sufficiently addressed the identified limitations by a) comparing our approach to existing physiologically-based methods, providing thorough comparisons of their respective outputs, b) applying it to a dataset of post-stroke participants to demonstrate that it can identify physiologically-interpretable markers of motor recovery and c) providing examples to demonstrate how readers can interpret the novel perspective introduced.

      Reviewer #1 (Public Review):

      The proposed study provides an innovative framework for the identification of muscle synergies taking into account their task relevance. State-of-the-art techniques for extracting muscle interactions use unsupervised machine-learning algorithms applied to the envelopes of the electromyographic signals without taking into account the information related to the task being performed. In this work, the authors suggest including the task parameters in extracting muscle synergies using a network information framework previously proposed. This allows the identification of muscle interactions that are relevant, irrelevant, or redundant to the parameters of the task executed.

      The proposed framework is a powerful tool to understand and identify muscle interactions for specific task parameters and it may be used to improve man-machine interfaces for the control of prostheses and robotic exoskeletons.

      With respect to the network information framework recently published, this work added an important part to estimate the relevance of specific muscle interactions to the parameters of the task executed. However, the authors should better explain what is the added value of this contribution with respect to the previous one, also in terms of computational methods.

      We thank the reviewer for their constructive comments. We have adjusted the introduction section of the manuscript to better explain the added value of this framework over previous work. Specifically, we draw the reviewer’s attention to the following updated section of the introduction:

      “In [11], we considered, key limitations among current approaches to muscle synergy analysis in extracting functionally relevant and interpretable patterns of muscle activity [12]. We proposed a combinatorial approach based on information- and network-theory and dimensionality reduction (the network-information framework (NIF)) that significantly improved the generalisability of the extraction process by, among others, removing restrictive model assumptions (e.g. linearity, same mixing coefficients) and the reliance on variance-accounted-for (VAF) metrics [12]. By determining the pairwise mutual information between muscles, this innovation paved the way for the appropriate mapping of muscular interactions to the task space. To elaborate on the significance of this development, the extraction of motor patterns in isolation of the task space comes at the expense of both functional and physiological relevance [12,13]. Furthermore, effective methods for mapping large-scale physiological dynamics to behaviour is a current gap across the neurosciences [14]. Thus, here we build on this work by, for the first time, directly including task space parameters during muscle synergy extraction. In doing so, we address these current research gaps, progressing muscle synergy research and successful engineering applications in a fruitful direction [12,15,16]. This enables us, in a novel way, to dissect the concept of the muscle synergy and therefore quantify interactions between muscle activations with shared or complementary functional roles. “

      In general, the method proposed relies on several hyperparameters and cost functions that have been optimized for the specific datasets. A sensitivity analysis should be performed, varying these parameters and reporting the performance of the framework.

      We thank the reviewer for this comment which enabled us to clarify a potential misunderstanding. Our proposed framework does not require setting or varying hyperparameters to optimise cost functions.

      For model-rank specification, a modularity maximising cost-function is used which determines what partitioning of the networks results in maximal modularity. We have offered two alternative approaches using this cost-function which consistently converge on the same solution. To further ensure the representativeness of this solution, we also offer a consensus-based approach where we apply these alternative approaches to individual participant or task data, then group the collective partitions together and re-apply the approaches. One of these approaches (Equation 2.2) requires two hyperparameters, γ and ω, which adjust the intra- and inter- network layer resolutions. As stated in the manuscript, we set both of these parameters to 1, thus nullifying their presence in the cost-function and aligning our work with the classical notion of modularity. Across the two alternative approaches to model-rank specification, the solution is unique and data-driven and has a demonstratable generalisability across datasets.

      The only other cost-function present in the framework is during dimensionality reduction, which is a standard loss function used across the muscle synergy analysis literature. Thus, the approach is essentially parameter-free and we now have mentioned this more explicitly in the manuscript:

      “To empirically determine the number of components to extract in a parameter-free way, we then concatenated these adjacency matrices into a multiplex network and employed network community-detection protocols to identify modules across spatial and temporal scales (fig.3(D)) [29–32,44].”

      “In its generalised multilayer form, the Q-statistic is given an additional term to consider couplings between layers l and r with intra- and inter-layer resolution parameters γ and ω (Equation 2.2). Here, μ is the total edge weight across the network and γ and ω were set to 1 in the current study for classical modularity [30], thus removing the need for any hyperparameter tuning.”

      It is not clear how the well-known phenomenon of cross-talk during the recording of electromyographic muscle activity may affect the performance of the proposed technique and how it may bias the overall outcomes of the framework.

      Indeed artifacts such as crosstalk are a standard issue across the EMG literature and may impact the performance of subsequent analyses where prevalent in the dataset. Crosstalk is expected to be present irrespective of the task and so should not affect redundant and synergistic muscle representations, however it could be present in the task-irrelevant muscle interactions extracted. Due to the prominence of long-range functional connections with the task-irrelevant representations extracted, we suggest that such artifacts are unlikely to have played a prominent role in the extracted patterns. Nonetheless, we have recognised this possibility with the following updated sentence in the Discussion section:

      “Although distinguishing task-irrelevant muscle couplings may capture artifacts such as EMG crosstalk, our results convey several physiological objectives of muscles including gross motor functions [65], the maintenance of internal joint mechanics and reciprocal inhibition of contralateral limbs [20,50].”

      Reviewer #2 (Public Review):

      This paper is an attempt to extend or augment muscle synergy and motor primitive ideas with task measures. The authors idea is to use information metrics (mutual information, co-information) in 'synergy' creation including task information directly. My reading of the paper is that the framework proposed radically moves from attempts to be analytic in terms of physiology and compositionality with physiological bases, instead into more descriptive ML frameworks that may not support physiological work easily.

      We thank the reviewer for taking the time to provide a thorough commentary on this manuscript. An overall aim in developing this framework is to build on other recent developments in providing a more fine-grained functional architecture underlying movement control [1,2]. It is a requirement for the successful communication and introduction of this toolbox to the field to provide readers with an understanding of how to use the framework and an intuition on how to interpret the results. Thus, we agree with the reviewer that functional interpretations are of crucial use.

      We also agree with the reviewer that maintaining a physiological underpinning is a desirable direction for the field and should not be made secondary to functional descriptions. In our updated version of this manuscript, we have therefore included direct comparisons with the gold-standard in the field for muscle synergy extraction, namely non-negative matrix factorisation based muscle synergy extraction (see ‘Building on current approaches to muscle synergy analysis’ and fig.5-6 of revised manuscript) [3,4]. In these comparison, we show how our framework goes beyond this current approach in terms of functional insight while still maintaining physiological relevance. Indeed, in the revised manuscript we also include a fourth dataset comprising post-stroke participants and healthy controls (Fig.6). We demonstrate, through a simple example application to this dataset, how our proposed framework can produce more predictive representations of motor impairment than the gold-standard approach. The representations we identified were discriminative of motor impairment measured via the Fugl-Meyer assessment using just one trial per participant. This improves considerably upon the sensitivity of the current approach to altered motor patterns which have predominantly required many trials and participants to gain significance [5,6]. Thus, the patterns we extract are a more comprehensive representation of the actual underlying physiological state of the participants.

      This approach is very different from the notions of physiological compositional elements as muscle synergies and motor primitives, and to me seems to really be striving to identify task relevant coordinative couplings. This is a meta problem for more classical analyses. Classical analyses seek compositional elements stable across tasks. These elements may then be explored in causal experiments and generative simulations of coupling and control strategies. The present work does not convince me that the joint 'meta' analysis proposed with task information added is not unmoored from physiology and causal modeling in some important ways. It also neglects publications and methods that might be inconvenient to the new framework.

      We would be very interested in receiving the reviewer’s suggestions of existing approaches that we have not incorporated here and would be happy to discuss these in the revised manuscript.

      Information based separation has been used in muscle synergy analyses using infomax ICA, which is information not variance based at core. Though linear mixing of sources is assumed, minimized mutual information is the basis.

      We agree with the reviewer that ICA relies on information measures, however it does not incorporate task-space information. The novelty of our approach lies in the characterisation of muscle interactions with respect to the task at hand. If the reviewer could provide references to this statement, we would be able to consider this further.

      Physiological causal testing of synergy ideas is neglected in the literature reviews in the paper. Although these are in animal work, the clear connection of muscle synergy choices and analyses to physiology is important and needs to be managed in the new methods proposed. Is any correspondence assumed? Possible?

      We agree with τhe reviewer that this a crucial element of muscle synergy research and will aim to address it in our future work. However, we would like to point out that the current manuscript is a “tools and resources” article aiming to introduce a new framework. In our revised manuscript, we have incorporated an application of the framework to a dataset from post-stroke patients to demonstrate the use of the framework in clinical settings to identify biomarkers and use them to make predictions of motor recovery (see Fig.6 of updated manuscript).

      Questions and concerns with the framework as an overall tool:

      First, muscle based motor information sources have influences on different time scales in the task mechanics. Analyses of synergies in the methods proposed will be very much dependent on the number and quality of task variables included and how these are managed. Standardizing and comparing among labs, tasks sets and instrumentation differences is not well enough considered as a problem in this new proposed method toolset, at least in my reading. Will replication, and testing across groups ever be truly feasible in this framework?

      We agree with the reviewer that this important point can be a limitation of the applicability of the framework. For this reason, we chose a “holistic” approach, applying the framework to several datasets collected in different settings, and selecting different kinds of task variables to extract muscle networks from. Crucially, we used a leave-one-task-out and leave-one-participant-out cross validation procedure to specifically address this point. Our results showed that the extracted couplings are robust irrespective of the task variable and/or participant excluded and this lends credit to the generalisability of the framework.

      Muscle based motor information sources have influences on different time scales in the task mechanics. Kinematic analyses, dynamic analyses and force plate analyses of the same task may provide task variables that alter the results in the proposed framework it seems.

      As we have mentioned above, here we used all the above types of task variables together to illustrate the range of measures that can be included in the proposed framework and showed that the outputs are robust to the exclusion of any task/participant. This point is especially evident for dataset 3 results, where high levels of generalisability were found despite the inclusion of kinematic, dynamic and IMU data (see Table 1. of original submission and updated manuscript). We believe that this is an advantage of the approach as it allows researchers to apply the method to different kinds of measurements they may have collected and gain insights into the relationships of muscle couplings with kinematic/dynamic/force parameters. This will also enable scientists to attribute different functional roles to the identified couplings and it is something we plan to do in future applications of the framework.

      Second, there is a sampling problem in all synergy analyses. We cannot record all muscles or all task parameters. Examining synergies across multiple tasks seeks 'stationary' compositionality. Including task specific elements may or may not reinforce or give increased coordinative precision to the stationary compositionality.

      We fully agree that this is a limitation of all synergy analyses and aimed to consider this study a step in the direction of addressing this limitation by providing the research community with a toolbox that can be used to quantify muscle couplings that can have different levels of task specificity.

      To me the new methods proposed seem partly orthogonal to the ideas of stable compositionality. The 'synergies' obtained will likely differ, and are more likely to be coordinative control groupings of recurrent task and muscle motifs (based on instrumentation) which may or may not relate to core compositionality in physiology. Is there any expectation that the framework should relate to core compositionality and physiology. This is not clear in the paper as written.

      In our new analysis, we have compared the proposed approach to existing physiologically-based methodologies and showed that the new framework can capture several salient physiological features of movement that the current NMF-based approach cannot. For example, as we have moved away from optimising variance accounted for metrics, our framework can identify subtle muscle couplings that have important functional roles. These subtle couplings are often not captured in current muscle synergy analysis as, against physiological relevance, higher amplitude muscles often take prominence. Further, by directly including task parameters during extraction, we can determine the muscles that have a functional role concerning the included task parameter rather than inferring this relationship indirectly using knowledge about the task executed. In our updated manuscript, by applying the framework to post-stroke participants (see Fig.6), we were also able to demonstrate that the extracted couplings are associated with functional parameters of motor recovery and have a clear link with the physiological state of individual participants.

      It would be useful to explore the approach with a range of neuromechanical models and controllers and simulated data to explore the issues I am raising and convince readers that this analysis framework adds clarity rather than dissolving the generalizability and interpretability of analyses in terms of underlying causal mechanisms.

      The authors need to better frame their work in relation to causal analyses if they are claiming links to muscle synergies analyses and claim extension/refinement. Alternatively, these may not be linked, and instead parallel approaches exploring different hypotheses and goals using different organizational data descriptors.

      To address the reviewers concerns here, we have included in the updated manuscript a toy example simulating situations in which pairs of muscles would have a redundant or synergistic functional relationship (see Fig.2). This simulation gives clear intuition on situations where two muscles (e.g. an antagonist-agonist pair) may share functionally similar or complementary information about task direction (left vs right). In particular, within the main text describing this figure, we state how current NMF based approaches consider muscles functionally equivalent when they share similar magnitude activations, whereas our framework captures muscles with identical task information. Thus, our work is an extension of current approaches towards understanding causal mechanisms. The suggestion to use neuromechanical models is valuable, however we consider it beyond the scope of this work. This “Tools and Resources” paper is aimed at introducing the computational framework for the analysis of large-scale muscle couplings in task space. Our future work will use this framework to address unanswered questions in the field and we hope that it will be helpful for other scientists in testing their hypotheses.

      To me this appears a data science tool that may not help any reductionist efforts and leads into less interpretable descriptions of motor control. Not invalid, but sufficiently different that common term use muddies the water.

      We believe that the novel evidence we provided both on simulated and real data have contributed to a better interpretability of the approach outcomes. Specifically, we have introduced examples showing the functional roles of the different types of interactions as well as the predictive power of the outputs. Concerning the use of the term synergy, we have provided a clear description throughout the manuscript regarding the interpretation of synergy vs redundancy in the novel perspective we propose. For example in the discussion section:

      “ We thus sought to provide greater nuance to the notion of ‘working together’ by defining motor redundancy and synergy in information-theoretic terms [6,56]. In our framework, redundancy and synergy are terms describing functionally similar and complementary motor signals respectively, introducing a new perspective that is conceptually distinct from the traditional view of muscle synergies as a solution to the motor redundancy problem [3,6,7]. In this new definition of muscle interactions in the task space, a group of muscles can ‘work together’ either synergistically or redundantly towards the same task. In doing so, the perspective instantiated by our approach provides novel coverage to the partitioning of task-relevant and -irrelevant variability implemented by the motor system along with an improved specificity regarding the functional roles of muscle couplings [20–22]. Our framework emphasises not only the role of functionally redundant muscle couplings that result from the underlying degeneracy of the motor system, but also of complementary, synergistic dependencies that are important for communication and integration across specialised neural circuitry [57,58]. Thus, the present study aligns the muscle synergy concept with the current mechanistic understanding of the nervous system whilst offering an analytical approach amenable to the continued advances in large-scale data capture [14,59].”

      Reviewer #3 (Public Review):

      In this study, the authors developed and tested a novel framework for extracting muscle synergies. The approach aims at removing some limitations and constrains typical of previous approaches used in the field. In particular, the authors propose a mathematical formulation that removes constrains of linearity and couple the synergies to their motor outcome, supporting the concept of functional synergies and distinguishing the task-related performance related to each synergy. While some concepts behind this work were already introduced in recent work in the field, the methodology provided here encapsulates all these features in an original formulation providing a step forward with respect to the currently available algorithms. The authors also successfully demonstrated the applicability of their method to previously available datasets of multi-joint movements.

      Preliminary results positively support the scientific soundness of the presented approach and its potential. The added values of the method should be documented more in future work to understand how the presented formulation relates to previous approaches and what novel insights can be achieved in practical scenarios and confirm/exploit the potential of the theoretical findings.

      Strengths:

      This work proposes a novel framework that addresses physiologically non-verified hypothesis of standard muscle synergy methods: it removes restrictive model assumptions (e.g. linearity, same mixing coefficients) and the reliance on variance-accounted-for (VAF) metrics.

      The method is solid and achieves the prescribed objectives at a computational level and in preliminary laboratory data.

      A toolbox is available for testing the methods on a larger scale.

      The paper is well written and shows a high level of innovation, original content and analysis

      Weaknesses:

      Task performance variables could be specified in more quantitative definition in future work (e.g.: articular angles rather than a generic starting point- end point).

      We agree with this point and will incorporate it in future work. Our aim here was to show that the framework would work with any task variable and that scientists can use it to identify the relevance of muscle interactions to different types of task parameters.

      The paper does not show a comparison with previous approaches (e.g.: NMF) or recently developed approaches (such as MMF).

      We have now illustrated such a comparison on two datasets and explained more how the new framework can dissect the different types of muscle groupings (see ‘Building on current approaches to muscle synergy analysis’ section and Fig.5-6 of revised manuscript).

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      In our revised manuscript, we have introduced 2 new applications of the framework to real data to exemplify its use for a) functional interpretability and b) identification of biomarkers (see ‘Building on current approaches to muscle synergy analysis’ section and Fig.5-6 of revised manuscript). We also point towards its use in movement restoration and augmentation devices and in the clinical setting in the discussion section:

      “The separate quantification of these muscle interaction types opens up novel opportunities in the practical application of muscle synergy analysis, as demonstrated in the current study through the identification of a significant predictor of motor impairment post-stroke from single-trials [5,12,65]. For instance, these distinct representations may encapsulate different neural substrates that can be specifically assessed at the muscle-level for the purpose of bodily restoration and augmentation [66]. Uncovering their neural underpinnings is an interesting topic for future research.”

      In this work, the effort of the authors aimed at developing the field is clear. It is fundamental to develop novel frameworks for synergy extraction and use them to make them more interpretable and applicable to real scenarios, as well as more adherent to recent findings achieved in motor control and neuroscience that are not reflected in the standard models. At the same time, muscle synergies are being used more and more in research but their impact in practical scenarios is still limited, probably because synergies have rarely been analyzed in a functional context. This paper shows a very in-depth analysis and a novel framework to interpret data that links to the task space from a functional perspective. I also found that the results on the datasets are very well commented but could expand more to show why using this framework is advantageous.

      There are some key points for discussion that follow from this paper which can be described more, maybe in future work, and that might contribute to major developments in the field, including:

      The understanding of how the separation between relevant (redundant and synergistic) and irrelevant synergies impact on synergy analysis in practical works;

      We have now introduced new figures (Fig. 5 and 6) to the revised manuscript, demonstrating simple applications of the framework and providing intuition regarding the outputs. We have also added points to the Discussion commenting on the differences between types of couplings and how they can be interpreted in future works:

      “Our framework emphasises not only the role of functionally redundant muscle couplings that result from the underlying degeneracy of the motor system, but also of complementary, synergistic dependencies that are important for communication and integration across specialised neural circuitry [57,58]. Thus, the present study aligns the muscle synergy concept with the current mechanistic understanding of the nervous system whilst offering an analytical approach amenable to the continued advances in large-scale data capture [14,59].”

      “Although distinguishing task-irrelevant muscle couplings may capture artifacts such as EMG crosstalk, our results convey several physiological objectives of muscles including gross motor functions [64], the maintenance of internal joint mechanics and reciprocal inhibition of contralateral limbs [19,49]. Thus, task-irrelevant muscle interactions reflect both biomechanical- and task-level constraints that provide a structural foundation for task-specific couplings. The separate quantification of these muscle interaction types opens up novel opportunities in the practical application of muscle synergy analysis, as demonstrated in the current study through the identification of a significant predictor of motor impairment post-stroke from single-trials [5,12,65]. For instance, these distinct representations may encapsulate different neural substrates that can be specifically assessed at the muscle-level for the purpose of bodily restoration and augmentation [66]. Uncovering their neural underpinnings is an interesting topic for future research.”

      Interpreting how different synergistic organizations described in this work allows to better describe data from real scenarios (e.g.: motor recovery of patients after neurological diseases);

      We have now added an example application of the framework to a dataset of stroke patients (Fig.6) and identified a redundant muscle patterns that are predictive of functional measures.

      Discussing in detail how the presented findings compare with standard algorithms such as NMF to determine the added value provided with this approach;

      As indicated above, we have now shown such a comparison on two new datasets (see Fig.5-6 of revised manuscript).

      Describe how redundant synergies reflect real neural organization and - if their "existence" is confirmed - how they contribute to redesign the concept of muscle synergies and of modular/synergistic control in general.

      This is an important point that we have now addressed more in our Discussion by relating redundant muscle couplings to degeneracy in the motor system and synergistic couplings to integrative dynamics by higher-level processes. We have also added a simple simulation illustrating how synergistic and redundant interactions co-exist and represent different contributions to task performance (see Fig.2 of revised manuscript).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Summary of changes

      I thank the reviewers for their thorough feedback on this paper and providing me with such a detailed list of recommendations. I have been able to incorporate many of their suggestions, which I believe has greatly improved this paper.

      The most important changes:

      • I added comparisons to the lexicon- and rule-based sentiment algorithms TextBlob and VADER to Supplementary Fig. 4. This shows the superiority of ChatGPT in scoring the sentiment of scientific texts compared to existing and already-validated tools for sentiment analysis based on natural language processing. [Suggestion Reviewer 2]

      • I added the measure intra-class correlation to Fig. 3b, emphasizing the inconsistency in sentiment scores across different reviews of the same paper. [Suggestion Reviewer 3]

      • I added Supplementary Fig. 6, in which I directly propose different experiments to test the causes of the observed gender effects on peer review. [Suggestion Reviewer 3]

      • I further studied the issue of variability in responses by ChatGPT (Supplementary Fig. 2), and learned that this has greatly improved in the latest version of ChatGPT (for Version Aug 3, 2023, R2 values of 0.99 (sentiment) and 0.86 (politeness) were reached). I show these findings in Supplementary Fig. 2. [Suggestions Reviewers 1 and 3]

      • Throughout the manuscript (most notably in the Abstract and Discussion), I emphasize that this is a proof-of-concept study, and make suggestions on how to scale this up across journals and fields. I also toned down certain claims given the relatively small sample size of this study, including in the abstract. I also more prominently and elaborately discuss the limitations of the study in the Discussion section. [Suggestions Reviewers 1, 2 and 3]

      • I made many smaller changes to text, figures and references on the basis of the reviewers’ comments. [Suggestions Reviewers 1, 2 and 3]

      Notably, Reviewer 3 has provided me with a very detailed list of recommendations for follow-up experiments. I appreciate their ideas, and I am currently considering different options for future work. Specifically I am looking to team up with a journal to perform the experiments laid out in Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted papers. As suggested by this reviewer, I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review.

      Based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      Reviewer #1 (Public review)

      Strengths:

      The innovative method is the biggest strength of this article. Moreover, the method can be implemented across fields and disciplines. I myself would like to see this method implemented in a grander scale. The author invested a lot of effort in data collection and I especially commend that ChatGPT assessed the reviews twice, to ensure greater objectivity.

      I want to thank this reviewer for commending the innovative methodology of this study. I appreciate that this reviewer would like to see this methodology implemented at a grander scale, which is a view that I share. I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores).

      The reviewers have provided me with a list of potential follow-up experiments, and I am currently considering different options for future work. Specifically I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript of a journal. In addition, as suggested by Reviewer #3, I am looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Importantly, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Weaknesses:

      I have several concerns regarding the methodology of the article. The first relates to the fact that the sample is not random. The selection of journal and inclusion and exclusion criteria do not contribute well to the strength of the evidence.

      Indeed, the inclusion of only accepted manuscript from a single journal is the biggest caveat of this paper. I have re-written much of the Abstract to emphasize that this is a proof-of-concept paper, hoping that other researchers concurrently expand this method to larger and more diverse datasets.

      An important methodological fact is that the correlation between the two assessments of peer reviews was actually lower than we would expect (around 0.72 and 0.3 for the different linguistic characteristics). If the ChatGPT gave such different scores based on two assessments, should it not be sound to do even more assessments and then take the average?

      This was a great recommendation by this reviewer, and a point also raised by Reviewer #3. Based on their suggestion, I looked into how each additional iteration of scoring would reduce the variability of scoring for a subset of papers (thus being able to advice users on an optimal number of iterations).

      Interestingly, I observed that ChatGPT has become significantly more reliable in providing sentiment and politeness scores in recent versions. For the latest version (ChatGPT Aug 3, 2023), R2 = 0.992 for sentiment and R2 = 0.859 for politeness were reached for two subsequent iterations of scoring. Unfortunately, OpenAI does not allow access to previous version of ChatGPT, so the current dataset could not be re-scored. Yet, based on these data, there may no longer be a need for people to perform repeated scoring. I show these data in Supplementary Fig. 2, as I believe this is very useful information for people who are interested in using this tool.

      Reviewer #1 (Recommendations to author)

      I had some difficulties reading the article, so it would maybe help to structure the article more (e.g. In the introduction there are three aims stated, so the Statistical Analysis section could be divided in three sections, and instead of the link to figures, the author could state which variables were analysed in a specific manner) to be easier to comprehend the details. Also, I found on one place that the sample consisted of 572 reviews, and on other that it was 558.

      These are very good points. I re-wrote the statistical analysis for clarity (Page 7 of the manuscript). The 558 reviews was a mistake from my part, as I forgot to include the fourth review for the 14 papers that received four reviews in the histograms of Fig. 2b and the accompanying text. This has been updated.

      For figures 1a and 1b it could be considered to enter the table instead of several figures.

      I thank the reviewer for pointing this out. I tried this suggestion, but I found it to reduce the readability of the paper. As an alternative, I now provide an Excel spreadsheet with all the raw data, so people can find all the characteristics of the included papers.

      99.8% of the reviews analysed were assessed as polite. This is, in my opinion, extremely important finding, which shows that reviewers are still holding to certain degree of standards in communication, and it can be mentioned in the abstract.

      I very much agree with this reviewer; this has now been added to the Abstract.

      In results you state that QS World Ranking is "imperfect" measure. When stating that in the results section, it poses the question why it is used in the study, so maybe it is more suitable for the discussion.

      This point is well taken. Even though the QS World Ranking score is imperfect, I still think it can be useful, as a rough proxy of perceived prestige of an institution. I now removed this “imperfect measure” statement from the Results section, and moved it to the Discussion (Page 5).

      In the Results section, instead of using only p values, please add measures of effect (correlations, mean differences), to make it easier to place in the context.

      For the significant effects of Fig. 4, I have added these to the figure legends. Please note that the used statistical tests are non-parametric, so I reported the Hodges-Lehmann differences (which is the median of all possible pairwise differences between observations from the two groups).

      I think the results interpretation should be softened a bit, or the limitations of the study should be placed as the second paragraph in the discussion, since this was only specific journal with specific subfield.

      I agree with this reviewer that the relatively small sample size of this paper demands more careful wording. Throughout the manuscript, I have toned down claims, and emphasized the “proof of concept” nature of this study (for example in the Abstract). I also moved the limitations section to the second paragraph of the Discussion, and elaborate more on the study’s caveats.

      Methods:

      The measure Review time was assessed from submission to acceptance, but this does not need to be review time since it takes a lot of time sometimes to find reviewers. that needs to be stated as the limitation.

      This point is well taken. I changed this to “Paper acceptance time” in Fig. 3 and the accompanying text.

      Gender name determination methods differed between the assessment of the first authors and the last authors, and that needs stronger explanation.

      I appreciate this reviewer raising this point, which has also been raised by Reviewer #3. For this paper, I have carefully weighed the pros and cons of automated versus manual gender determination. Initially, my intention was to rely only on a programmatic method to identify authors' names. However, I came to realize that there were inaccuracies in senior author gender predictions made by ChatGPT/Genderize. This was evident to me due to my personal familiarity with some of these authors, either because they are famous or through personal interactions. It seemed problematic to me to proceed with this analysis knowing that these misclassifications would introduce unnecessary variability to the dataset.

      The advantage of the relatively small sample size in this study was the opportunity to manually perform this task, rather than being fully dependent on algorithms. While I attempted manual gender identification for the first author as well, this was way more challenging due to their limited online presence. The discrepancy in gender identification accuracy between first and senior authors did not go unnoticed, and I acknowledge the issue it presents. I also recognize that, unlike senior authors, reviewers may not necessarily be familiar with the first authors of the papers they evaluate, as indicated in the original submission of this paper. In light of this, I sought input from several PIs who often serve as reviewers. Their feedback confirmed that they typically possess knowledge of senior authors' identities, for example through conferences, whereas the same is not true for first authors. Yet, this may be different for other scientific disciplines, where the pool of reviewers might be bigger.

      Notably, for future studies I may make a different decision, especially when I use larger datasets that require me to automate the process.

      I also realize that my rationale for the different methods of gender determination was not explained well enough in the original submission; I now explain my reasoning more elaborately on Page 7 on the manuscript.

      For sentiment analysis: Please state based on what the GPT made a decision? Which program? (e.g. for gender it used genderize.io)

      This has been added to Page 7.

      Finally, your entire analysis can be made reproducible (since everything is publicly available). You can share ChatGPT chats as online materials with variables entered with the dataset analysed and the code. This would increase the credibility of the findings.

      I will make the entire raw dataset available through the eLife website, including all reviews and their scores.

      Reviewer #2 (Public review)

      Strengths include:

      1) Given the variability in responses from ChatGPT, the author pooled two scores for each review and demonstrated significant correlation between these two iterations. He confirmed also reasonable scoring by manipulating reviews. Finally, he compared a small subset (7 papers) to human scorers and again demonstrated correlation with sentiment and politeness.

      2) The figures are consistently well presented and informative. Figure 2C nicely plots the scores with example reviews. The supplementary data are also thoughtful and include combination of first/last author genders. It is interesting that first author female last author male has the lowest score.

      3) A series of detailed analysis including breaking down reviews by subfield (interesting to see the wide range of reviewer sentiment/politeness scores in computational papers), institution, and author's name and inferred gender using Genderize. The author suggests that peer review to blind the reviewers to authors' gender may be helpful to mitigating the impoliteness seen.

      Thank you.

      Weaknesses include:

      1) This study does not utilize any of the wide range of Natural Language Processing (NLP) sentiment analysis tools. While the author did have a small subset reviewed by human scorers, the paper would be strengthened by examining all the reviews systematically using some of the freely available tools (for example, many resources are available through Hugging Face [https:// huggingface.co/blog/sentiment-analysis-python ]). These methods have been used in previous examinations of review text analysis (Luo et al. 2022. Quantitative Science Studies 2:1271-1295). Why use ChatGPT rather than these older validated methods? How does ChatGPT compare to these established methods? See also: colab.research.google.com/drive/ 1ZzEe1lqsZIwhiSv1IkMZdOtjPTSTlKwB?usp=sharing

      This was a great recommendation by this reviewer, and I have tested ChatGPT against TextBlob and VADER, the two algorithms also used by the Luo et al. study — see Supplementary Fig. 4. Perhaps unsurprisingly, these algorithms performed very poorly at scoring sentiment of the reviews. Please note that I also tested these two algorithms at scoring individual sentences, Tweets and Amazon reviews, which it did very well (i.e., the software package was working correctly). Thus, ChatGPT is better at scoring scientific texts than TextBlob and VADER, likely because these algorithms struggle with finding where in the review the sentiment is conveyed. I now discuss this on Pages 1, 3 and 4 of the manuscript.

      2) The author's claim in the last paragraph that his study is proof of concept for NLP to analyze peer review fails to take into account the array of literature already done in this domain. The statement in the introduction that past reports (only three citations) have been limited to small dataset sizes is untrue (Ghosal et al. 2022. PLoS One 17:e0259238 contains over 1000 peer review documents, including sentiment analysis) and reflects a lack of review on the topic before examining this question.

      I thank this reviewer for pointing me to this very useful study. I regret missing this one in my initial submission; I now discuss this paper in Pages 1 and 5 of the manuscript.

      3) The author acknowledges the limitation that only papers under neuroscience were evaluated. Why not scale this method up to other fields within Nature Communications? Cross-field analysis of the features of interest would examine if these biases are present in other domains.

      I share this reviewer’s opinion that it would be very interesting to expand this analysis to different subfields. I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores). The different reviewers have provide me with a list of potential follow-up experiments, and I am currently considering different options for future work, including expanding into different fields within Nature Communications. Additionally, I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript papers of a journal. I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Yet, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Reviewer #3 (Public review)

      Strengths:

      On the positive side, I thought the use of ChatGPT to score the sentiment of text was novel and interesting, and I was largely convinced by the parts of the methods which illustrate that the AI provides broadly similar sentiment and politeness scores to humans who were asked to rank a sub-set of the reviews. The paper is mostly clear and well-written, and tackles a question of importance and broad interest (i.e. the potential for bias in the peer review process, and the objectivity of peer review).

      Thank you.

      Weaknesses:

      The sample size and scope of the paper are a bit limited, and I have written a long list of recommendations/critiques covering diverse aspects including statistical/inferential issues, missing references, and suggestions for other material that could be included that would greatly increase the usefulness of the paper. A major limitation is that the paper focuses on published papers, and thus is a biased sample of all the reviews that were written, which prevents the paper properly answering the questions that it sets out to answer (e.g. is peer review repeatable, fair and objective).

      I very much appreciate this reviewer taking the time to provide me with such a detailed list of recommendations. Below, I will respond to this list in a point-by-point manner.

      Reviewer #3 (Recommendations to author)

      My main issues with the paper are that it is not very ambitious, and gave me the impression the aim was to write the first paper using ChatGPT to address this question, rather than to conduct the most thorough and informative investigation that would have been feasible (many obvious questions that could be addressed are not tackled, since the sample size is small and restricted). There are also issues with selection bias, and the statistical analysis, that have possibly led to erroneous inferences and greatly limit what conclusions can be drawn from the analysis. I hope my comments of use in further improving the paper.

      The repeatability of ChatGPT when calculating the two linguistic characteristics is low. Taking the average of multiple assessments is one way to deal with this. To verify that taking the average of, say, 5 scores gives a repeatable score, the author could consider calculating 10 scores for a set of 20-30 reviews, calculating two scores for each review using the first 5 and second 5 ChatGPT ratings, and then calculating repeatability across the 20-30 reviews. It is important to demonstrate that ChatGPT is sufficiently repeatable for this new method to be useful.<br /> Also, it might be possible to automate this process a bit to save time - e.g. the author could change the ChatGPT prompt, like "please rate the politeness of this review from -100 to +100, do it 10 times independently, and print your 10 ratings as well as their average". Hopefully the AI is smart enough to provide 10 independently-computed ratings this way, saving the need to copypaste the prompt into the chat box 10 times per review.

      This was a great recommendation by this reviewer, and a point also raised by Reviewer #1. Based on their suggestion, I looked into how each additional iteration of scoring would reduce the variability of scoring for a subset of papers (thus being able to advice users on an optimal number of iterations). I also tested this Reviewer’s suggestion to ask ChatGPT to score many times, and give separate scores for each iteration — this worked very well.

      Interestingly, I observed that ChatGPT has become significantly more reliable in providing sentiment and politeness scores in recent versions. For the latest version (ChatGPT Aug 3, 2023), R2 = 0.992 for sentiment and R2 = 0.859 for politeness were reached for two subsequent iterations of scoring. Unfortunately, OpenAI does not allow access to previous version of ChatGPT, so the current dataset could not be re-scored. Yet, based on these data, there may no longer be a need for people to perform repeated scoring. I show these data in Supplementary Fig. 2, as I believe this is very useful information for people who are interested in using this tool.

      To my mind, the main reason to use an AI instead of one or more human readers to rank the sentiment/politeness of peer reviews is to save time, and thereby allow this study to have a larger sample size than would be feasible using human readers. With this in mind, why did you choose to download only 200 papers, all from the discipline of Neuroscience, and only from Nature Communications? It seems like it would be relatively easy to download papers from many more journals, fields of research, or time periods if using AI-based methods, and in fact it would have been feasible (though fairly laborious) for one person to read and classify the sentiment of the reviews for 200 papers.

      As well as providing more precise estimates of the parameters you are interested in (e.g. the consistency of reviews, and the size of the difference in reviewer sentiment between author genders), expanding the sample beyond this small set of papers would allow you to address other interesting questions. For example, you could ask whether the patterns observed for neuroscience are similar to those in other research disciplines, whether Nature Comms is representative of all journals (given there are other journals with public reviews), and you could test whether the male-female differences have become greater or smaller over time (e.g. by comparing the male-female differences observed in the past to the effect size observed in 2022-23). Additionally, the main analyses in this paper would have higher statistical power - for example, you only include 53 papers with a female senior author, giving you quite low power/ precision to estimate the gender difference in the average sentiment of reviews (given the high variance in sentiment between papers).

      I want to thank this reviewer for taking the time about possible ways to increase the impact of this work. I agree, these are all great suggestions, and there are many possibilities to apply ChatGPTbased natural language processing to scientific peer review. Respectfully, I chose to continue with publishing this work in the form of a proof-of-concept paper, because I currently do not have the resources to perform this (quite labor intensive) study. Below I will explain my reasoning, that I also shared with Reviewers #1 and #2.

      I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores). The different reviewers have provide me with a list of potential follow-up experiments, and I am currently considering different options for future work, including expanding into different fields within Nature Communications. Additionally, I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript papers of a journal. I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Yet, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals. The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Also, if you could include some reviews of papers that were reviewed double-blind, you could test whether the gender-related differences in peer reviews are ameliorated by double-blind reviewing. Nature Comms (and many other journals with open review) do have some double-blinded papers, and there is evidence that that double-blinding is preferentially selected by authors who think they will experience discrimination in the peer review process (DOI: 10.1186/s41073-018-0049-z), and also that double-blinding does ameliorate bias (DOI: 10.1111/1365-2435.14259), so this seems very relevant to the ideas under study here.

      I note that the PLOS journals allow open peer review, and there is an API for PLOS which one can use to download the reviews for a given paper (e.g. try this query to get to the XML file of a paper which has open peer review: http://journals.plos.org/plosone/article/file?id=10.1371/ journal.pone.0239518&type=manuscript). Using an API could allow this project to be scaled up, because you can programmatically search for the papers with open reviews, download those reviews using the API and some code, and then score them using the same ChatGPT-based methods used for Nature Comms. Also, Publons recently merged with Web of Science (Clarivate), and you can now read all the open peer reviews on Web of Science for papers which had open review (e.g. for this paper: https://www-webofscience-com.napier.idm.oclc.org/wos/woscc/fullrecord/WOS:000615934800001). It would be possible to write to Web of Science, request access to their data or search engine, and programmatically download many thousands of papers and their associated reviews, and then use ChatGPT or a similar AI to score them all (especially if you can pass the reviews to ChatGPT for scoring programmatically, instead of manually copy-pasting the reviews into the chat box one at a time as it appears was done in the present study).

      These are great suggestions, and I have different plans for follow-up studies, including the use of APIs to download large batches of peer reviews. The analyses in this paper have been performed in February of this year, even before the ChatGPT API had been released, which did not let me automate the process at that time. As a result, these analyses have been performed manually. I realize that the field is moving rapidly, and that there are now different options to scale this up quickly.

      I plan on using the suggestions from this Reviewer for follow-up experiment in a next paper, and publish this revision as a proof-of-concept paper. In this way, different researchers can optimally use ChatGPT-based sentiment analyses for similar studies without a delay.

      As you acknowledge, there is a selection bias in this study, since you only include papers that were ultimately published in Nature Comms (missing reviews of papers that were rejected). This is a really big limitation on the usefulness of some of your analyses. For example, you found no relationship between author institutional prestige and reviewer sentiment. This could be evidence of a fair and impartial review process (which seems unlikely!), or it could be a direct result of selection bias (specifically a "collider bias", like the famous example involving height and skill among professional basketball players). The likelihood that a paper is published is positively related both to its quality and the prestige held by the authors, we might expect a flatter (or even negative) correlation between prestige and reviewer sentiment among papers that were published than among the whole set of papers (like how the correlation between height and speed/skill is less positive among NBA players than among the general population, since both height and speed/skill provide advantages in basketball).

      I agree with this reviewer that the selection bias is a major limitation of this study. I rewrote much of the Abstract and Discussion to tone down claims, and more prominently discuss the limitations of this study. I also made several suggestions for follow-up experiments.

      In the section "Consistency across reviewers", you write that there was little similarity between review sentiment scores from different reviewers from the same paper, and then write "This surprising result indicates high levels of disagreement between the reviewers' favorability of a paper, suggesting that the peer review process is subjective." However I disagree with this conclusion for three reasons:

      • Firstly, your dataset only includes papers that were published, and thus there is a selection bias against manuscripts where both/all reviewers disliked the paper - the removal of this (probably large) set of reviews will add a (potentially very strong) downward bias to your estimate of how consistent the review process is (since you are missing all those papers where the reviewers agreed). I think that one cannot properly answer the question "are reviewers consistent in their appraisals" without having access to papers that were rejected as well as those that were accepted.

      I agree with this reviewer that there is a selection bias in this study, which I acknowledged throughout the initial submission of this manuscript. Indeed, having access to reviews of rejected papers will greatly increase my confidence in this finding. However, if there is consistency across reviewers in the entire pool of (post-review rejected+accepted) manuscripts, some of that has to trickle down into the pool of accepted papers. The correlation between sentiment scores of the different reviewers is so strikingly low (or even absent) that I simply cannot envision a way in which there is consistency across reviewers in the pre-editioral decision stage. Yet, I realize that this point is debatable. Therefore, I changed the phrasing of the Discussion section, including the following sentence:

      That being said, the extremely low (or even absent) relation between how different reviewers scored the same paper was striking, at least to this author.

      • Secondly, the method used to assess whether the reviews for each paper tend to be similar (shown in Figure 3b) does not fully utilize the information contained in the data and could be replaced with another method. (In the paper 3 univariate regressions compare the sentiment scores for R1 vs R2, R1 vs R3, and R2 vs R3, which needlessly splits up the data in the case of papers with more than 2 reviewers, reducing power.) You could instead calculate the intraclass correlation coefficient (aka 'repeatability'), to determine what proportion of the variance in sentiment scores is between vs within papers (I suggest using the excellent R package rptR for this). Note that the sentiment scores are not normally distributed, and so regular regression (as you used) or one-way ANOVA (which you might be tempted to use for the ICC calculation) are not ideal - consider using a GLM or transformation (the rptR package automates the tricky calculation of repeatability for generalized models).

      I thank this reviewer for pointing me towards this option. I added this analysis to Fig. 3b, which confirmed the inconsistency in sentiment scores for reviews of the same paper (ICC = 0.055). As suggested by this reviewer, I decided to perform the ICC on log-transformed data, as ICC calculation is very sensitive to non-normally distributed data.

      • Thirdly, an alternative and very plausible hypothesis for this lack of similarity (besides peer review being highly subjective) is that ChatGPT is estimating the "true sentiment" of a review (i.e. what the reviewer intended to say) with some amount of error (e.g. due to limitations/biases in the AI, or reviewers struggling to make themselves understood due to issues such as writing in a second language, typos, or writing under time pressure), which dilutes the similarly in the estimated sentiment of the reviews. In other words, if the true sentiment values are strongly correlated, but there is random error in how those values are estimated by ChatGPT, then the correlation between reviewer scores for each paper will tend to zero as the error tends to infinity. Furthermore a nebulous quality like "sentiment" cannot be fully summarised in a single variable running from -100 to +100, and if you had used a more multi-dimensional classification system for the reviews (or qualitative assessment by human readers) you might have found that there is a bit more correspondence (I'm speculating here, but I think you cannot really exclude this and the paper doesn't mention this limitation).

      This point is well taken. I added caveats to the Discussion section on Page 5. Altogether, after taking these caveats into account, I do believe that this analysis convincingly demonstrates subjectivity in the peer review of this subset of papers. That said, I hope that my re-written discussion and additional analysis have added the necessary nuance to this point.

      In Figure 3C, you write "Contribution of paper scores to review time". This strongly implies to the reader that the sentiment scores inferred for the reviews have a causal effect on the review time. This is imprecise writing (since the scores were calculated by you after the papers were published, and thus cannot be causal - you mean that the actual reviews affected the review time, not the scores), but more importantly you cannot infer any causality here since your dataset is observational/correlational. You could fix this by re-phrasing to emphasise this, e.g. "Statistical associations between paper scores and review time".

      This is a very good point raised by this reviewer. I have corrected the phrasing so it no longer implies causality.

      For the analysis shown in Figure 4d and Figure 4e, I am not certain what you mean by "data split per lowest/median/highest sentiment score". This is ambiguous, and I am also not sure what the purpose of this analysis is or what it shows - I suggest re-writing for greater clarity (and ideally providing the code used in all your analyses) and perhaps revising the analysis. Additionally, an important missing piece of information from this analysis (and most analyses in the paper) is the effect size. For example, you don't report what is the difference in politeness score and sentiment score between male and female authors, and what is the SE and 95% CIs for this difference. From eyeballing the figure, it looks like the difference in politeness is about 4 points on your 200point scale - this is small in absolute terms, but might be quite large in relative terms given that "politeness score" usually hovered around a small part of the full 200-point scale. What is this as a standardised effect size (i.e. in terms of standard deviations, as captured by effect sizes like Cohen's d and Hedges' g)? Calculating this (and its 95% CIs) would allow you to say whether the difference between genders is a "big effect", and give an idea of your confidence in your effect size estimate and any inferences drawn from it. You even discuss the effect size in your discussion, so it would help to calculate the standardised effect size. If you're not familiar with effect size and why it's useful, I found this paper very instructive: https://onlinelibrary.wiley.com/ doi/abs/10.1111/j.1469-185X.2007.00027.x

      I agree with this reviewer that this phrasing was ambiguous. I now rephrased this on Page 4 of the manuscript:

      To study whether these more impolite reviews for female first authors were due to an overall lower politeness score, or due to one or some of the reviewers being more impolite, I split the reviews for each paper by its lowest/median/highest politeness score. I observed that the lower politeness scores for first authors with a female name was driven by significantly lower low and median scores (Fig. 4d, bottom panel). Thus, the least polite reviews a paper received were even more impolite for papers with a female first author.

      I also added effect sizes of the significant effects from Fig. 4 to its figure legend. Please note that the used statistical tests are non-parametric, so I reported the Hodges-Lehmann differences (which is the median of all possible pairwise differences between observations from the two groups).

      "Double-blind peer review has been debated before, but has come under scrutiny for various reasons" - this is vague and unhelpful. I think it's worthwhile to properly engage with the debate and the substantial body of evidence in your paper, given your main focus is on potential bias in the review process based on authors' identities (e.g. gender, institutional prestige).

      I thank the reviewer for pointing this out. I rephrased this sentence to indicate that there is evidence that it helps to remove certain forms of bias (Page 5):

      To address this issue, double-blind peer review, where the authors' names are anonymized, could be implemented. Evidence suggests that this is useful in removing certain forms of bias from reviewing8,9, but has thus far not been widely implemented, perhaps because some studies have cast doubt on its merits21,22.

      I have also added a Supplementary Fig. 6 to this paper, in which I lay out how my tool can be used to study bias by applying it to single- and double-blinded reviews (see also my answer to the other question about this topic below).

      On a related note, in the first paragraph, when discussing the potential of single-blind review to allow reviewers to essentially discriminate against papers by women, there is a key missing citation. This year, the first truly experimental test of this hypothesis was published (DOI: 10.1111/1365-2435.14259); a journal conducted a randomised controlled trial in which submitted manuscripts were reviewed either single- or double-blind. They found no effect of author gender on reviewer ratings or editorial decisions (though there was an effect of review type on success rate of authors from different countries). It would be better to cite this instead of reference 6, which as you acknowledge is methodologically flawed. This paper is also worth a read given your focus on Nature journals: DOI: 10.1186/s41073-018-0049-z.

      This point is well taken. I now cite this paper (citation #8) and rephrased this part of the Introduction (Page 1).

      "Another - arguably more simple - solution [compared to double-blind peer review] could be for reviewers to be more mindful of their language use." Here, you seem to be saying that we don't need to blind author names during peer reviewers, because it would simpler if all reviewers were simply nicer! I object to this because A) double-blind review is easy to implement, and greatly reduces the opportunity to tune the review to the author's identity (and there is some experimental evidence that it works in this regard), and B) it seems like wishful thinking to say that we don't need to implement measures that reduce the scope for bias, because all reviewers could instead stop using impolite language.

      This is a very valuable comment. I rephrased this to emphasize that this is an additional measure.

      "reviewers may want to use ChatGPT to extract a politeness score for their review before submitting" Yes, that's an interesting idea, and I can imagine that some (probably small) proportion of reviewers will be interested in doing this. But I think you should think bigger about wholesale changes to the review system that are possible because of AI like ChatGPT. For example, the submission platforms where reviewers submit their reviewers (e.g. ScholarOne, Manuscript Central) could be updated to use AI to pre-screen draft reviews, and issue a warning to reviewers, like "Our AI assistant has indicated that the writing in this review might be impolite (example phrases here) - would you like to edit your review before you submit it?" Also, reviewcredit platforms like Publons could display not only the number of reviews that someone wrote, but an AI-generated assessment of how constructive, detailed, and polite their reviews are (this would help nudge people into writing better reviews, and also give credit where it's due to careful reviewers, which is part of the aim of Publons and similar platforms). This is just off the top of my head - there are many other good ideas about how AI could transform the peer review process. Indeed, AI is already good enough to generate quite useful peer reviews and constructive criticism of draft papers, and will surely get better at this... this surely has lots of implications for science publishing over the coming decades.

      These are great suggestions for implementation of this tool. I now end the first paragraph of the Discussion (Page 4) with the following sentence:

      Such an automated language analysis of peer reviews can be used in different ways, such as afterthe-fact analyses (as has been done here), providing writing support for reviewers (for example by implementation in the journal submission portal), or by helping editors pick the best papers or most constructive reviewers.

      "Further research is required to investigate the reasons behind this effect and to identify in what level of the academic system these differences emerge." Here you could mention what this research would be - I think you'd need the full sample of reviewed papers, not just those that were accepted. Spell out what analyses would be required to test and falsify the various (very plausible and interesting) competing hypotheses that you mention for the male-female difference in sentiment scores.

      Great point. I added a Supplementary Fig. 6, in which I show a visual depiction of the experiments that can be performed to answer these questions.

      "areas of concern were discovered within the academic publishing system that require immediate attention. One such area is the inconsistency between the reviews of the same paper, highlighting the need for greater standardization in the peer review process." I disagree here. I think it is natural for there to sometimes be differences in how two or more reviewers rate the quality of a paper, even if the peer review process were carefully standardised (e.g. via the use of a detailed "peer review form", which helps guide reviewers to comment on all important aspects of the paper - some journals use these). This is because reviewers differ in their experience, expertise, or interests, and so some reviewers will catch mistakes that others miss, or request stylistic changes that others would not. More broadly, it's often not possible to write a version of the paper that satisfies all possible reviewers.

      I re-phrased part of the Discussion on Page 5 to indicate other sources of inter-reviewer variability. Specifically, I mention that some variability in sentiment can be expected based on the different backgrounds of the reviewers:

      Notably, some level of variability may be expected, for example due to different backgrounds, experiences, and biases of the reviewers. In addition, ChatGPT may not always reliably assess a reviews sentiment, adding some spurious inter-reviewer variability.

      Yet, as also mentioned in my response to one of the previous questions, I still find the the extremely low levels of consistency striking, even after taking these possible sources of interreviewer variability into account.

      "the maximum score an institution could receive was 100 (in 2023 this was Massachusetts Institute of Technology)" - this seems unnecessary information (just mention the score runs from 0-100).

      I agree with this reviewer that this was unnecessary information. This has been removed.

      "reviewers are generally familiar with the senior author of papers they review and thus are likely aware of their gender identity." This seems like a strong assumption, and you don't provide any evidence for it Speaking personally, as a reviewer and journal editor I am often not familiar with the senior author, or I am familiar with the first author - I am not sure how often I know the senior author but not the first author or vice versa. It's also not always the case that the first author is a junior scientist and the last author a senior, famous one, as you imply. I suggest that you use the same approach to score the gender of both author positions, namely inferring their gender programmatically from their name (I agree that generally the important thing for the purposes of this study is the gender that reviewers will infer from the name, not the author's actual gender, and so gender estimation from first names is the correct approach).

      I appreciate this reviewer raising this point, and I have carefully weighed the pros and cons of both approaches. Initially, my intention was to rely only on a programmatic method to identify authors' names. However, I came to realize that there were inaccuracies in senior author gender predictions made by ChatGPT/Genderize. This was evident to me due to my personal familiarity with some of these authors, either because they are famous or through personal interactions. It seemed problematic to me to proceed with this analysis knowing that these misclassifications would introduce unnecessary variability to the dataset.

      The advantage of the relatively small sample size in this study was the opportunity to manually perform this task, rather than being fully dependent on algorithms. While I attempted manual gender identification for the first author as well, this was way more challenging due to their limited online presence. The discrepancy in gender identification accuracy between first and senior authors did not go unnoticed, and I acknowledge the issue it presents. I also recognize that, unlike senior authors, reviewers may not necessarily be familiar with the first authors of the papers they evaluate, as indicated in the original submission of this paper. In light of this, I sought input from several PIs who often serve as reviewers. Their feedback confirmed that they typically possess knowledge of senior authors' identities, for example through conferences, whereas the same is not true for first authors. Yet, this may be different for other scientific disciplines, where the pool of reviewers might be bigger.

      Notably, for future studies I may make a different decision, especially when I use larger datasets that require me to automate the process. I now more elaborately explain why I made this decision on Page 7 of the manuscript.

      In the Abstract, you write "suggesting a gender disparity in academic publishing". This part of the sentence contains no information about what you think is the cause of the male/female difference, and no further interpretation of its ramifications, so I think you can just remove it (because "disparity" just means a difference, so you are effectively saying something redundant like "there was a difference between papers with male and female senior authors, suggesting there is a difference")

      I thank the reviewer for pointing this out. I replaced the latter part of this sentence with “(…) for which I discuss potential causes.”, which I think is better than a short summary of potential causes which may lack the nuance that such a topic deserves.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First of all, we would like to again thank the reviewers for their work. We appreciate the constructive review comments and useful suggestions to further improve our article. With those comments in mind, we have now revised our manuscript. Please see below for a point-by-point response (our responses in green) to all comments.

      Reviewer #1 (Recommendations For The Authors):

      Sun and colleagues outline structural and mechanistic studies of the bacterial adhesin PrgB, an atypical microbial cell surface-anchored polypeptide that binds DNA. The manuscript includes a crystal structure of the Ig-like domains of PrgB, cryo-EM structures of the majority of the intact polypeptide in DNA-bound and free forms, and an assessment of the phenotypes of E. faecalis strains expressing various PrgB mutants.

      Generally, the study has been conducted with a good level of rigor, and there is consistency in the findings. However, I do have some specific technical concerns relating to the study that necessitate the undertaking of additional experiments. These are summarized as follows:

      1) Recombinant PrgB188-1233 produced in the study purifies as a mixture of monomeric and dimeric species separatable by SEC. There is very limited discussion in the text re. the significance and/or implications of this. Is it feasible that the dimeric form is biologically relevant in the context of the in vivo situation? Or alternatively, is this simply an artifact of protein production?

      Experimental data that we published in 2018 indeed indicates that the dimer is relevant in the in vivo situation. We did not discuss this here since this was discussed in detail in the previous paper: Schmitt et al, 2018. We have now added a bit more information on this in the results section, highlighting this, so that it is clearer to the reader (lines 114-116).

      2) The authors see no evidence of the adhesive domain of PrgB in their PX structure highlighting that this must have been cleaved during crystallisation. Is this claim supported by an inspection of the crystal packing? It could be that this region of the protein is dynamic within the context of the crystal and is thus not observed. This should be clarified in the text either way.

      The crystal packing does not provide any space for the PAD. We have added this to the results section. We have added a sentence describing this in lines 122-124.

      3) The Cryo-EM structures reported are both at ~10-angstrom resolution. Are the authors truly confident in the placement of their crystal structures on these maps? Visual inspection indicates that their positioning of the PrgB domains into the EM envelopes is somewhat questionable. The authors need to provide some quantitative measures of the quality of their domain fitting. The narrative of the manuscript very much hinges on this being correct.

      This is something that the other reviewer also commented on. The fitting of the crystal structures in the maps are indeed not optimal, but was the best we could do with the available data. In line with point #6, we have now constructed new protein variants of the stalk domain (the four Ig-like domains) alone, and have assayed it’s interaction with the PAD in vitro using native gels and size exclusion chromatography. The outcome of these experiments is that the two domains do not interact in any substantial way on their own. Thus, the added experiments do not support the hypothesis that the PAD interacts with the Ig-like domains, at least not without the local high concentration provided by the linker region in the in vivo situation.

      To account for these new experiments, we have moved the cryo-EM structure to the supplement, and rewritten this part of the manuscript to say that the cryo-EM data indicated that there might be an interaction, but that we have not been able to verify this in vitro, indicating that if the interaction at all exists it must have a low affinity and is likely not physiologically relevant. In line with this, we have also further modified the text throughout the manuscript to account for this.

      4) The manuscript would be significantly strengthened if the authors could include confirmatory hydrodynamic data in support of the observed conformational reorganization of PrgB in the presence of DNA. SAXS analysis of the DNA-free and bound complexes would be ideal for this and would also help address the issues raised above in pt 3.

      To analyze PrgB radius with and without DNA, we tried both SEC-MALS and DLS experiments. It proved difficult to obtain precise and reproducible values, but the initial data indicated that no large changes were observed upon DNA binding. As we could also not measure specific interaction between the PAD and the stalk in vitro, we did not perform SAXS experiments. As mentioned in the response to point #3, we have modified the results and discussion regarding the potential interaction of th PAD and Stalk domains.

      5) The authors present binding studies of various PrgB mutant-expressing strains. A number of the mutations generated delete significant portions of the polypeptide. Can the authors confirm that these mutant proteins are correctly folded despite the introduced mutations? It could be that loss of function is simply a consequence of mutation-induced misfolding. I would like to see some confirmatory data (CD, SEC, etc.) in support of the foldedness of the mutant proteins.

      We cannot completely rule out that the folding of some of the variants is affected in E. faecalis. However, CD or SEC experiments would only give indications of the contrary if the overall fold had been majorly affected in an in vitro situation where the protein is not anchored to the E. faecalis cell wall.

      To alleviate this valid concern, we probed if all variants are correctly exported and linked to the cell-wall. Therefore we have now extracted the cell wall of E. faecalis producing wild-type or variant PrgB and performed Western blot . The results of the Western blot with cell wall extract largely matches the whole cell experiments that were in the initial manuscript. If a protein variant was largely misfolded, it would likely not be targeted and linked to the cell-wall, nor would it be stable in vivo. We have added this new data as a new fig 3 – figure supplement 1 and on lines 201-214

      6) The authors suggest a direct interaction between the PAD and the stalk domains in PrgB. The discussion of this is very generic and no evidence to support this is provided other than the 10-angstrom resolution EM map. If they believe this to be the case, then additional evidence should be provided.

      Answer: As mentioned previously, we have now performed additional in vitro experiments to probe this potential interaction, but conclude that this indication from the EM data is likely not a real high affinity interaction. In line with this, we have modified the results and discussion regarding this point, see also response to point #3 and 4.


      Reviewer #2 (Recommendations For The Authors):

      As currently presented, I don't feel that the cryoEM data support the authors' proposed model, largely because the fit of the crystal structures to the EM volumes does not seem entirely reasonable for the apo- dataset and because the EM volume for the ssDNA bound dataset is not even contiguous. For me to believe the model as it is currently built, I would want to see a dataset with the PAD deleted, showing that its proposed density disappears, or a dataset with a PAD-specific antibody as a fiducial marker. It would be nice to see some goodness of fit metric with a comparison to other crystal structures fit such low-resolution data as well. At the very least, the authors must include the standard cryoEM workflow supplementary figure showing representative micrographs, 2Ds, and 3Ds along with particle numbers.

      In line with the comments raised by reviewer #1, we have now added more experiments where we have analyzed the potential interaction between PAD and the stalk domain. From this new data, it looks like they do not interact with any substantial affinity, at least not on their own without any linker region holding them together, and that this interaction if it all exist likely is not physiologically relevant. The cryo-EM data has been moved to the supplement as we agree with both reviewers that the resolution, and the fitted model, is not good enough to draw any hard conclusions. The standard table for the cryoEM workflow was present as supplementary table 2, where eg particle numbers etc are described, but we have now also added a new supplementary fig 2 – figure supplement 2 that shows the EM processing workflow, including representative micrographs, 2D and 3D classes. We debated whether we should remove the EM data, but decided against it in line of transparency and to explain why the interaction studies with the PAD and stalk domains were performed.

      The X-ray crystallographic structure is very nice, but I was a bit surprised by the R factors in Table 1. After downloading the structure factors and coordinates from the PDB (thank you for depositing before submission!) I was able to see quite a few positive peaks in the difference map that could probably use some cleaning up. I realize I may just be a bit of a masochist when it comes to adding/deleting waters and moving around side chains to get things just right, but for such lovely data, I would have liked to see the model polished up a bit more. I was going to say that the isopeptide bond should be modelled, but I can see from a cursory Google that the authors did in fact try to find a way to model this and that it is indeed a bit of a pain.

      The model refinement proved surprisingly recalcitrant with regards to the remaining difference density, so we took the decision to only model what was solidly there (which leads to slightly higher R factors). We did indeed try to model the isopeptide bond, but we did not find a good way to do so (despite trying quite extensively), and ended up determining them as a linker in the PDB file, so that the bond shows up when one opens the structure in eg. Pymol.

      For protein production/purification in general I would have liked to see actual traces for the gel filtration and pure protein on a gel in a supplementary figure. I strongly believe that this type of information is so critical for future researchers looking to replicate or build upon published work so that they have some sense that what they are doing is working in the way it should be.

      We have now added a supplementary figure (as new Fig. 1 – figure supplement 1) that shows SEC and SDS-PAGE for the purification of PrgB188-1233.

      Finally, I think for the in vivo data it only makes sense to show the reader whether any or all the differences measured across your different mutants are statistically significant. Having done the graphing and analysis in GraphPad this should be a simple thing to achieve.

      We have now added statistical test (One way Anova) that show the statistical significance between the mutants, and show that in Fig 3 and Fig 4.

      Overall, I think it's a very nice paper and while I feel that the cryoEM data in its current form doesn't support the model of occlusion from PrgA, I also don't think that removing the cryoEM data and that specific mechanistic idea from the paper detracts from its overall message and impact.

      Thank you for those comments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      p. 5, l. 87-90: The control of flgM by OmrA/B (PMID 32133913) and the antisense RNA to flhD (PMID 36000733) are other examples of known regulatory RNAs that impact the flagellar regulon.

      We thank the reviewer for pointing out these references and have added citations to them (page 5, lines 87-91).

      p.11/Fig. 3: it is intriguing that ArcZ and RprA, two of the rpoS-activating sRNAs, repress lrhA. I realize that it is outside of the scope of this study, but have the authors considered the possibility that ArcZ or McaS could have a role in the previously reported repression of rpoS by LrhA (PMID 16621809)?

      We agree that it is intriguing that ArcZ and RprA, two of the rpoS-activating sRNAs, repress lrhA, and added mention of this regulatory connection (page 12, lines 247-250).

      p. 13/l. 272: I do not understand why the authors say that "r-proteins were almost exclusively found in chimeras with MotR and FliX and no other sRNAs...", given that several other chimeras between r-prot and other sRNAs are found

      While some r-proteins encoding genes were found with other sRNAs in RIL-seq datasets, MotR and FliX generally had the highest numbers. The text was revised to better describe the RIL-seq data for r-proteins interaction partners (page 14, lines 291-295), and a new panel showing the S10 operon with all the interacting sRNAs was added to Figure 3—figure supplement 1B.

      Fig. 4 and 5: One possible improvement would be to more systematically assess the effect of base-pairing mutants of the sRNAs, such as MotRM1 or FliXM1 on fliC and rps/rpl genes in vivo. This is especially important for the mutants that affected the sRNA effects in the in vitro probing assays, such as UhpU-M2, MotR-M1 and FliX-S-M1 on fliC (Fig. S7)

      As suggested, we examined fliC mRNA levels across growth in motR-M1 and fliX-M1 chromosomal mutants. The results of these northern assays, now shown in Figure 8—figure supplement 1, are consistent with our model as we observed delayed expression of fliC mRNA in motR-M1 background and premature expression in fliX-M1 background (page 21, lines 444446, 449-453).

      Fig. 5: it may be worth including a schematic of the whole S10 operon to highlight its length and its organization?

      As suggested, a schematic representation of the S10 operon was added to Figure 3—figure supplement 1 with a summary of the RIL-seq data for this operon.

      Probing data (Fig. 5, S7 and S9): in general, it is difficult to differentiate the thin and thick brackets, and what is indicated by the dashed brackets is not always clear. Maybe using a color-code instead could help? Highlighting the predicted pairing regions on the different gels could be useful as well.

      We thank the reviewer for this suggestion and color-coded the brackets (Figure 5, Figure 4figure supplement 2, and Figure 5-figure supplement 2). The correspondences to regions of predicted pairing are described in the figures legends.

      Fig. S10: The experimental evidence used to support FliX-dependent degradation of the rpsS mRNA is indirect (primer extension to observe higher levels of cleavage intermediates). It would be nice to be able to observe a decrease in the mRNA levels as well, either by Northern, or primer extension from a region more distant to the FliX pairing site.

      The S10 operon is long (~5 KB). We have tried multiple probes for this mRNA and detect many bands with each, likely due to extensive regulation of this operon. We think teasing out the origin of the different bands to appropriately interpret changes in patterns will require a significant amount of work.

      legend of Fig. S10: from the gel, it seems that only the plasmids differ in the samples, and it is not clear where the data corresponding to the WT strain mentioned in the legend is shown

      The samples shown in this figure are all for the indicated plasmids in the WT strain. We corrected the figure legend.

      Table S1: please define the NOR (normalized odds ratio?)

      The definition of Normalized Odds Ratio was added to the legend of Supplementary file 1.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Figure 1B. Please add a negative control (which could be in the supplementary section) from a large section showing transcripts that are not directly influenced by Hfq.

      We think the flgKLO browser in this figure serves as a negative control; flgK and flgL clearly are not enriched on Hfq in contrast to FlgO. Figure 1B was generated using published datasets that are easily accessible to the readers at a genome browser and show many other examples of transcripts that are not influenced by Hfq: https://genome.ucsc.edu/cgi-bin/hgTracks?hubUrl=https://hpc.nih.gov/~NICHD- core0/storz/trackhubs/ecoli_rilseq/hub.hub.txt&hgS_loadUrlName=https://hpc.nih.gov/~NICHDcore0/storz/trackhubs/ecoli_rilseq/session.txt&hgS_doLoadUrl=submit

      Line 158. MotR* is a more abundant version of [the constitutively overexpressed] MotR. Is there a Northern or qPCR to confirm this? While I understand the relevance of these mutated constructs, their high expression can lead to artefactual effects.

      This is a valuable point and therefore we provided a northern blot to document the relative levels of MotR and MotR* (Figure 2—figure supplement 1A).

      Figure 2. The overexpression of MotR/MotR* from a plasmid is increasing the number of flagella. However, when the MotR gene is deleted, is there a reduction of the number of flagella? Same question with FliX: what happens when the fliX gene is deleted? According to the model described in the manuscript, we should expect fewer flagella in ΔmotR background and an increased number of flagella in ΔfliX background. Both Figure 2 and Figure 8 would benefit from additional experiments with deleted motR and fliX genes.

      We agree that experiments regarding the endogenous effects of endogenous sRNAs are important. We provided such data in Figure 8 and Figure 8—figure supplement 1 for MotR and FliX in a variety of assays: flagella numbers by electron microscopy, motility and competition assays, expression of flagellar genes by RT-qPCR and western analysis. The chromosomallyexpressed MotR-M1 and FliX-M1 base pairing mutants did show the expected phenotypes of reduced and increased numbers of flagella, respectively (Figure 8A-B). As suggested by reviewer 1, we added northern analysis that examined fliC mRNA levels across growth in motRM1 and fliX-M1 chromosomal mutants. The results of these northern assays are consistent with our model as we observed delayed expression of fliC mRNA in motR-M1 background and premature expression in fliX-M1 background. We went to the trouble of constructing strains carrying point mutations in the chromosomal copies of these genes rather than deletions to avoid interfering with the expression of motA and fliC given that MotR and FliX encompass the 5’ and 3’ UTRs, respectively.

      Figure 3 is key to demonstrating the sRNAs pairing with their specific targets and potential effect on bacterial swimming. However, these results would be more relevant with endogenous expression of the sRNAs and demonstration of their effects on the same targets. A Northern blot showing the overproduced sRNA level compared to endogenous sRNA level could help us appreciate the expression ratio.

      The levels of the UhpU, MotR and FliX expressed from the overexpression plasmids are at least 100-fold higher than the endogenous levels. Thus, we agree that assays of chromosomal deletion/point mutants are important experiments. We did construct chromosomal uhpU-M1 and uhpU∆seed sequence mutants. However, under the conditions assayed, the uhpU chromosomal mutations did not result in observable effects on motility or FlhD-SPA protein levels. It is possible we would be able to detect differences between the wild type and uhpU chromosomal mutant strains under different growth conditions or in different assays, but this would require a significant amount of work. For many other sRNA chromosomal mutations have no or only subtle effects, suggesting redundancy between sRNAs or sRNA roles in fine tuning gene expression.

      Figure 4. In panel B, the empty plasmid pZE alone seems to positively affect the flagellin expression when compared to the WT background. This can also be seen in Figure 4C. There is no fliC signal with empty plasmid pBR* but a strong fliC signal with empty plasmid pZE. Maybe the authors can explain this in the manuscript.

      With respect to panel B and Figure 4—figure supplement 1A, we agree that there is some variation between the levels of flagellin in the WT and pZE control samples, possibly due to the addition of antibiotic to the pZE culture. We added quantification of the bands in Figure 4— figure supplement 1 to better document the changes in flagellin levels.

      With respect to panel C, the pBR samples were collected in crl+ background while the pZE samples were collected in crl- background, which explains the lack of fliC signal in the pBR control sample. This is now noted in the figure legend.

      In lines 154-157, the justification for using two plasmids is described. An IPTG-inducible Plac promoter, the pBR*, is used because the constitutive overexpression of UhpU is resulting in mutated UhpU clones. These observations suggest a toxic expression level of UhpU that the cell can only tolerate when the UhpU RNA is somewhat deactivated by mutations. This does not seem like a detail and could be discussed further.

      We agree with the reviewer that this observation is important and now mention that it suggests at a critical UhpU role (page 8, lines 160-163).

      Figure 5E and I. While the bindings of MotR on rpsJ and Flix-S on rpsS are clear, the resolution of both gels in the areas of binding (upper part of both gels) could be improved.

      We found it tricky to choose the mRNA fragments for the in vitro structure probing for the regions of predicted pairing internal to CDSs. Given that we hoped to retain native RNA folding, we chose long fragments; for rpsJ, we started with the +1 of S10 leader and for rpsS, we started 147 nt into the CDS, a region that overlaps the region that was cloned to the rpsS-rplV-gfp fusion. Consequently, the region of base pairing is in the upper part of both gels. The gels were already run for an unusually long time. Thus, we do not think the resolution could be improved further. Nevertheless, we think the region of protection is evident for both mRNAs.

      Minor comments:

      Fig 1B. The promoter symbols are extremely small, please increase the size.

      As suggested, we have enlarged the promoter symbols in Figure 1B as well as in Figure 3A.

      Line 211. "the lrhA mRNA has an unusually long 5´ UTR". How long exactly?

      The 5’ UTR of the lrhA mRNA is 371 nt long. This is now mentioned in the text (page 11, line 224)

      Line 320. Should "Fig 9C" be "Fig S9C" instead?

      We thank the reviewer for noticing this typo. Callouts to supplementary figures have now been renumbered per eLife format.

      Line 384. Something seems to be missing in the sentence "a representative combined class 2 and 3 promoter".

      The sentence has been modified to clarify the designation (page 19, lines 409-411).

      Reviewer #3 (Recommendations For The Authors):

      Recommendation to clarify/strengthen the presentation of science in the paper:

      Lines 102-103: Can the authors provide some more information on how the sRNAs were initially discovered to be potentially sigma-28 dependent and selected?

      As suggested, we expanded the section discussing the discovery and the selection of these sRNAs (page 6, lines 104-109).

      Lines 192-193: It would be helpful to provide a bit more information in the main text about what are the different RIL-seq data sets (18 in total).

      As suggested, we now provide more details about the different RIL-seq datasets we used in the analysis (page 10, lines 202-205).

      It would be helpful to specify the criteria for "top" interactions in targets retrieved from RIL-seq data (Table S1 and text, e.g., line 273): e.g. number of conditions, number of chimeras, etc.

      As suggested, we now more explicitly specify the criteria for selecting targets to characterize (page 10, lines 205-206).

      Fig. 4B/ S6 and line 242: The flagellin amount in the empty vector control (pZE) looks higher than in WT, and the stated effect of MotR/MotR* OE on flagellin is not very clear from the blot. The "cross-reacting band" above flagellin also seems to vary among strains. Could the authors include a quantification of flagellin protein amount and normalize relative to a housekeeping protein (e.g., GroEL), instead of Ponceau S as loading control?

      We agree that there is some variation between the levels of flagellin in the WT and pZE control sample, possibly due to the addition of antibiotic to the pZE culture. We added quantification of the bands in Figure 4—figure supplement 1 to better document the changes in flagellin levels.

      Figure legends: It would be helpful to have a bit more information about the method used/displayed image rather than stating results in the legends.

      As suggested, we now provide a bit more information about the methods used/displayed image in the figure legends to allow for easier comprehension of the data presented in the figures (while trying to balance this with the length of the legends).

      Fig. 2: Please include a scale for all electron microscopy images or, if it is the same for all panels, state it in the figure legend. Moreover, the same image is used for the pZE control in panel C, E and Figure S4A/C. It would be better to show different fields of bacteria for the pZE sample.

      As is now mentioned in the legends to Figure 2, Figure 2—figure supplement 2, and Figure 8, the same scale was used for all panels. We thought it was better to show the same image for the pZE control in the different panels to emphasize that these samples were all analyzed on the same day.

      Fig. 2: The sRNA OE strains seem to show some heterogeneity in cell length (pZE-MotR) or width (pZE-FliX). The authors could, e.g., check whether this is a phenotype correlated to sRNA OE by quantifying these parameters for different fields and comparing to WT or comment on this in the text if this is not consistently seen.

      We also were intrigued by the slightly different sizes and widths of cells in the EM images. However, our statistical analysis did not reveal significant differences between the different samples. We now comment on this (page 53, lines 1178-1179).

      As a follow-up to this study, it would be interesting to assess the impact of MotR and FliX regulation of ribosomal protein synthesis on overall ribosome activity (e.g., via Ribo-seq), also considering that antitermination regulates rRNA transcription. In the case of MotR, the authors suggest that MotR upregulation of S10 protein might not only impact antitermination, but also lead to the formation of more active ribosomes that would increase flagellar protein synthesis (lines 359-362). However, in the RNA-seq performed in OE MotR* several transcripts encoding rRNA and ribosomal proteins are significantly downregulated compared to EVC (Supplementary Table S2). Could the authors comment on this?

      We share the reviewer’s enthusiasm for follow-up work and thank for the suggested experiments. We hope we will be able to decipher the full mechanism of MotR and FliX action on ribosomal protein synthesis in future experiments. The observation that some ribosomal protein-coding gene levels are reduced in the RNA-seq experiment with overexpression of MotR* is interesting but we do not have an explanation other than the fact that the samples were collected early in exponential growth. We now mention the observation in the text (page 19, lines 404-407).

      Considering that OE of the WT MotR appears to increase fliC mRNA abundance but has no strong impact on flagellin protein levels, can the authors speculate what is the physiological relevance of MotR* for flagellin production?

      We agree that while we do see significant increases in the flagella number and fliC mRNA abundance with MotR and MotR* overexpression, the western analysis did not reveal a striking increase in flagellin levels and also wonder how MotR strongly increases the flagella number, which requires flagellin subunits, but only has a weak effect on the intercellular levels of flagellin. One possibility explanation is that it is more difficult to see significant increases for a protein whose levels are high to begin with. These points are now discussed (page 13, lines 264-269).

      Fig. 4C: The pZE samples seem to show variable expression of fliC mRNA although the samples are collected at the same timepoints. Try to clarify in the text.

      The northern membrane on the bottom was exposed for a longer time due to the lower fliC mRNA levels in the samples with FliX overexpression. We now note these differences in the legends to Figure 4 and Figure 4—figure supplement 1.

      Fig. 7/S13: While a volcano plot for MotR is shown in Fig. 7A, quantification of GFP reporter fusion regulation is shown for MotR. Quantifications of MotR are shown in Fig. S13. Maybe swap the figures.

      Given that the data for MotR are in the supplement figures for all other figures we would also like to retain this distribution for Figure 7 (aside from the volcano plot since this experiment was only carried out for MotR).

      Lines 135-136 (Fig. S1B): on the northern blots, only sRNA levels of MotR are comparable between rich and minimal media (excluding M63 G6P and M63 gal). Most other sRNA seem to be more abundantly expressed in minimal media conditions compared to LB. Maybe rephrase.

      As suggested, the text was revised to point out the differences in the sRNA levels for cells grown in different growth media (page 7, lines 140-144).

      Lines 229-234: this paragraph seems not directly connected to the aims of the study (i.e., no effect on motility tested of these other sRNAs) and could be removed (or moved to discussion).

      We appreciate the reviewer’s suggestion but, considering Reviewer 1’s comments, think that showing the regulation of lrhA by other sRNAs has value in highlighting the complexity of the regulatory circuit. We have revised the text to incorporate Reviewer 1’s suggestions and better explain why these results are intriguing (page 12, lines 247-250).

      Line 200 and Fig. S5: For FlgO sRNA only one target was identified in RIL-seq. This gene could be specified and labeled in Fig. S5 and the text. Does FlgO also bind ProQ?

      We now mention the single FlgO target (gatC) detected in four datasets (page 10, lines 213215). In Figure 3—figure supplement 1, we labeled only targets that we followed up with in the current study. Therefore, to be consistent, we prefer not to label gatC in the FlgO plot. FlgO was found to co-immunoprecipitate with ProQ but at much lower levels than with Hfq, and to have very few RNA partners (Melamed et al., 2020).

      Lines 493-498: It is mentioned that the four sRNAs were also detected in recent RIL-seq experiments of Salmonella and EPEC. Are any of the here identified targets also found in other species or was none detected as analyses were carried out under conditions that do not favor flagella expression?

      The targets identified in this study were not detected in the Salmonella and EPEC RIL-seq datasets. However, the Salmonella and EPEC experiments were carried out under different growth conditions. Based on the sequence conservation of the Sigma 28-dependent sRNAs across several bacterial species (Figure 8—figure supplement 2), we do think overlapping targets will be found in other bacterial species under the appropriate growth conditions.

      The strongest evidence of MotR dependent target regulation is the one on rpsJ, which does not necessarily require the additional experiments with MotR. Since the authors were able to show upregulation of the rpsJ-gfp reporter upon OE of MotR WT, it would have strengthened the results if they performed the experiments in Fig. S8C with MotR WT. Similary as an increase of flagella number was seen with OE of MotR WT in Fig. 2A, the effect of the OE S10∆loop could be compared to OE MotR instead of OE MotR (Fig. 6A). At least if would be helpful, to briefly comment on why MotR* was used instead of MotR WT for these experiments.

      As suggested, we state MotR was used in some assays given the stronger effects for some phenotypes (page 10, lines 196-197). We think, given that we established MotR and MotR cause the same effects, with increased intensity for the latter, it is reasonable to use MotR* in some of the experiments.

      p. lines 482-491 and 508-511: The authors discuss that both UhpU sRNAs and RsaG sRNA from S. aureus are derived from the 3'UTR of uhpT, but conclude there is no overlap regarding flagella regulation, suggesting independent evolution of these sRNAs. However, the authors also mention that UhpU sRNA has many additional targets beyond LhrA involved in carbon and nutrient metabolism. Thus, maybe regulation of metabolic traits could be a conserved theme and function for UhpU and RsaG? Maybe try to comment on or better connect these two parts in the discussion.

      As suggested, we now comment on the possibility of the regulation of metabolic traits being a conserved theme and function for UhpU and RsaG (page 24, lines 520-527).

      Check the text for consistency regarding the use of italics for gene names (e.g., legend of Figs. 7 and 8)

      The text was corrected.

      Please introduce abbreviations, e.g., G6P (line 139), REP (line 150), ARN (line 258), NOR/U (Table S1 legend)

      As suggested, we now introduce the abbreviations for G6P (page 7, line 142), REP (page 8, lines 155-156), and NOR (Supplementary file 1 legend). Regarding ARN, these sequences are already written in parentheses in the same sentence. However, we revised this to “ARN motif sequences” (page 13, line 278).

      Fig. S1A: Highlight REP sequence mentioned in text (line 150).

      REP sequences are now highlighted in gray in Figure 1—figure supplement 1A.

      Fig. S1C: It would be helpful to list number nt positions on the sRNAs based on full-length transcripts.

      The corresponding positions based on the full-length transcripts have also been added to this figure.

      Fig. S2: Adjust the position of UhpU-S label.

      UhpU-S label position was adjusted.

      Fig. S6: Include UhpU in the figure title.

      UhpU was added to the title.

      Fig. S10: It would be helpful to indicate on the figure (or state more clearly in the legend) which RNA was extracted from WT or ΔfliCX background.

      The samples shown in the Figure are all in a WT strain. We corrected the figure legend accordingly.

      Line 290: the effect is on flagella number, not motility.

      This typo is now corrected (page 15, line 312).

      Fig. S8: One-way ANOVA (panel A legend)

      This typo is now corrected (page 64, line 1433).

      Line 320: Fig. S9C instead of 9C

      We thank the reviewer for noticing the typo. The numbering of the supplementary figures has now been changed to the eLife format.

      It would be helpful to add reference for statement in line 57.

      A reference to (Fitzgerald et al., 2014) was added as suggested.

      Add PMID:32133913 as reference for post-transcriptional regulation of the flagellar regulon in the introduction (lines 87-91)

      The indicated reference was added as suggested (page 5, lines 87-91).

      Legend Fig. S6: expand view -> expanded view

      This typo is now corrected (page 63, line 1406).

      line 513: sRNA -> sRNAs

      This typo is now corrected (page 25, line 549).

      Fig. 8G: Maybe include lrhA as target of UhpU sRNA at top of the cascade.

      As suggested lrhA has been added as a target of UhpU at the top of the cascade.

  2. Sep 2023
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      • The improvement of the gene annotations of the ferret genome was an important part of this study, and so I would recommend that the authors have a results section and figure dedicated to documenting this.

      Thank you so much for appreciating our efforts on improving gene models, which was indeed a critical part in this study. According to the reviewer’s suggestion, we added a new section to the main text, “Improvement of the gene model for scRNA-seq of ferrets” with a figure (Fig.1 C, D, E).

      • Are the references to figure S8A, B alright (line 306)? In fact, that entire figure was not well described or out of place. In general, unlike the rest of the manuscript, the section dealing with the human-ferret comparison was a little bit confusing, and the figure legends were not extremely helpful. Could the authors please revisit the main text and figure legends of this section for clarity?

      We agree with the reviewer’s recommendation. We removed references to Figure S8A, B. In place of that, we explained the reason more carefully; “We chose a recently published human dataset (Bhaduri et al, 2021) for comparison, because this study containing GW25 dataset which included more tRG cells than previous studies that did not contain GW25 data. Furthermore, we used only data at GW25”

      We also revised several parts in this section to understand more easily by additional explanations as well as in the legends of Fig. 7 and Fig. S8.

      Reviewer #2 (Recommendations For The Authors):

      I have a few very minor comments on the manuscript.

      • I would caution the authors against claiming that they have demonstrated bona fide generation of ependymal cells from tRG cells. While the expression of FOXJ1 is a very good indication, they have not demonstrated the morphological transformation of a tRG cell into an ependymal cell.

      We agree the reviewer’s opinion. We have never thought that we proved that tRG differentiates ependymal cells, but we consider that this is highly likely the case (We use the term “suggest” in the abstract). To prove this genetically, we extensively tried to knock the EGFP gene into the CRYAB gene by the CRISPR/Cas9 method, to be able to show the lineage relationship between tRG and ependymal cells. However, we have so far failed to do this for a year trial. We also tried to just label tRG with EGFP and follow it in the slice culture.

      However, we failed to keep the slice in the culture until we observed the transition from tRG shape to the ependymal shape. It seems to be a slow process. What we could do was to observe the transition from single cilia to multi-cilia, which is part of the morphological transition from epithelial neural stem cells such as Radial Glia to an ependymal-like sheet form. To prove this transition from tRG to ependymal cells (and also astrocytes) is one of the most important issue which needs some new idea, technique or strategy.

      • There are several typos throughout the manuscript that I would recommend fixing for example, page 5 line 123 says "OLIGO2" instead of "OLIG2"

      Thank you so much. We carefully read and corrected typos. We wish we corrected all of them.

      Besides these two points, the manuscript is already prepared to a high standard.

      I really appreciate reviewersʼ efforts to finish reviews in a short time, responding to our request related to the first authorʼs thesis application.

    2. Author Response

      Summary of reviewers recommendations.

      Reviewer 1

      Point# 1. Make a new section in the text with a figure about the improvement of the genomic information (gene modeling) of ferrets ".

      Point# 2. the references to figure S8A, B alright (line 306)?

      Point# 3. Revise the main text and figure legends of the section dealing with the human-ferret comparison for clarity.

      Reviewer 2

      Point# 4. Weaken (change the text from “conclusive” to suggestive” ) the expression that we identified that tRG become ependymal cells, because we have not demonstrated the morphological transformation of a tRG cell into an ependymal cell, which is practically difficult although we have shown morphological change in terms of the single-cilia to multi-cilia form transition (Fig. S6A).

      Point# 5. Correct several typos throughout the manuscript that I would recommend fixing for example, page 5 line 123 says "OLIGO2" instead of “OLIG2.

      Provisional revision plan and our responses.

      Point #1 The new section for the improvement of gene models will be made by transferring the part of methods to the main text and Fig S2B,C to new Figure 1 with one schematic panel.

      Point #2; We cited (Bhaduri et al., 2020) as a reference in the figure S8A , while "Bhaduri et, al, 2021” was cited in the text. Which is correct? We will correct this, by choosing the correct one. Descriptions are indeed poor regarding Fig. S8A and S8B in the text as well as in the legends.

      Point #3 : We will describe the methods of comparison between ferrets and humans more thoroughly, by adding definition of words such as gene scores, subtype scores in the main text. (as well, the explanation of (Figure S3C) will be improved. ). Legends for Fig. 6 are too simple. So we would explain more in these legends. Explanations of analysis and figures, which we made, responding to the reviewer comments of “review commons” are generally not easy to understand with too short explanations, comparing with complexity of figures and contents, let’s say, Figure S8A-D. We will give more explanations for each of panel in Figure S8A-D, and E and F.

      Point #4; The authors' response to this point goes like this; we totally agree that we need to genetically labeling (knocking in the Cryab gene) to prove “tRG cells differentiate ependymal cells”. We tried many times but eventually failed. We have partially show single-cilia to multi-cilia transition which is characteristic to epithelial-ependymal transition. This process appears to take a long time and therefore, morphological tracing by time-lapse imaging in tissue culture is not a realistic way, Therefore, we weakened the conclusion; it is "highly likely" that tRG cells differentiate to be ependymal cells.

      Point#5: We will survey typos-> correct them, by all authors read the manuscript carefully again.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable investigation of the chromatin dynamics throughout the cell cycle by using fluorescence signals and patterns of GFP-PCNA and CY3-dUTP, which labels newly synthesized DNA. The authors report reduced chromatin mobility in S relative to G1 phase. The technology and methods used are solid, but the significance of the work is reduced by the model system employed, the HeLa cell line, which has a greatly abnormal genome.

      We have obtained data from a diploid human cell that validates the reduction of S-phase chromatin mobility.

      Public Review:

      The manuscript presented by Pabba et al. studied chromatin dynamics throughout the cell cycle. The authors used fluorescence signals and patterns of GFP-PCNA (GFP tagged proliferating cell nuclear antigen) and CY3-dUTP (which labels newly synthesized DNA but not the DNA template) to determine cell cycle stages in asynchronized HeLa (Kyoto) cells and track movements of chromatin domains. PCNA binds to replication forks and form replication foci during the S phase. The major conclusions are: (1) Labeled chromatin domains were more mobile in G1/G2 relative to the S-phase. (2) Restricted chromatin motion occurred at sites in proximity to DNA replication sites. (3) Chromatin motion was restricted by the loading of replisomes, independent of DNA synthesis. This work is based on previous work published in 2015, entitled "4D Visualization of replication foci in mammalian cells corresponding to individual replicons," in which the labeling method was demonstrated to be sound. Although interesting, reduced chromatin mobility in S relative to G1 phase is not new to the field.

      It was first shown in yeast (Heun et al. 2001; DOI:10.1126/science.1065366) that the S-phase mobility is reduced compared to the G1 phase. This was followed by other papers showing the same in yeast [(Gasser 2002; DOI: 10.1126/science.1067703), (Smith et al. 2019; DOI: 10.1091/mbc.E19-08-0469)]. The relation between chromatin motion and cell cycle progression in the mammalian genome is less studied. Over recent years there have been a few studies that addressed chromatin mobility and cell cycle progression but from a different perspective. In the publication Nozaki et al. (2017; DOI:10.1016/j.molcel.2017.06.018) chromatin motion analysis was performed on single histones. The study did not find a significant change of histone/nucleosome mobility measured during cell cycle progression. Using CRISPR/dCas9 to label random DNA loci, Ma et al. (2019; DOI:10.1083/jcb.201807162) found that chromatin motion in S-phase was significantly lower than in the G1 phase. However, most of the studies measure the chromatin motion using either insertion of ectopic loci or proteins marking the loci (dCas9) or histones. Using either ectopic loci addition or CRISPR/dCas9 might have an effect on the chromatin mobility itself and measuring single histone motion is not equivalent to measuring the motion of DNA segments. We, therefore, opted to label the DNA directly using the replication of the DNA. In this manner we preserve the native chromatin structure and, thus, motion.

      Importantly, in addition to measuring decreased DNA motion in S-phase, our study indicates that it is not the DNA synthesis per se but the loading of replisomes onto chromatin that slows down its motion. This allowed us to propose a mechanism on how chromatin motion is affected by DNA replication in S-phase.

      The genome in HeLa cells is greatly abnormal with heterogeneous aneuploidy, which makes quantification complicated and weakens the conclusions.

      We agree that the HeLa cells are aneuploid and we have addressed the heterogeneity of HeLa Kyoto within our detection methods (for clarification see point 3). To validate our conclusions in normal diploid human cells, we performed the chromatin mobility analysis using human fibroblasts (IMR90 cells in figures 2, 3 and S2) and plotted the MSD curves for different cell cycle stages. The outcome of this analysis showed that the mobility of chromatin in diploid fibroblasts in S-phase is lower than in G1 and G2. In fact, this effect is stronger in IMR90 cells than in HeLa Kyoto cells. Hence, this is not an aneuploid tumor cell phenomenon.

      The manuscript is difficult to follow in places due to insufficient clarity. The manuscript should be written in a way that can be understood without referencing previous articles. Overall, the work is moderately impactful to the field.

      Major recommendations:

      1) In Figure 1B, the illustration and images for S phase are confusing. The author should specify which is early S and which is late S. Do the yellow circles represent GFP-PCNA foci? How did the authors distinguish mid S from early S and late S (in Figure 2)? Are all images in Figure 1 scaled to the same contrast threshold?

      The yellow circles correspond to the colocalized signal of GFP-PCNA and Cy3-dUTP that overlap and represent the labeled chromatin sites that are replicated in the next cell cycle.

      We clarified all the points mentioned above and updated figure 1 and figure 2 accordingly.

      2) In Figure 2B, the y-axis is marked as "Frequency of cells" but the equation listed below is counting DNA (per focus). How to convert DNA (per focus) to DNA (per cell)? The x-axis is marked as "Genome size" without any unit (e.g., kb? Mb?) The x-axis seems to be the C factor, not the genome size.

      To determine the amount of DNA present in each labeled DNA focus, we first segmented the whole nucleus and measured the total intensity of DAPI (DNA amount) which is called IDNA TOTAL. Then the labeled replication foci are segmented and the intensity of label present in each segmented foci is measured (IRFi). Throughout the S-phase progression the amount of DNA increases twofold from early to late S-phase. The cells at each cell cycle stage were determined using the PCNA pattern. By plotting the frequency (number of cells) and the relative genome content normalized to the G1 stage we calculated the relative genome size otherwise called cell cycle correction factor for each stage from G1 to G2. The ratio of DNA intensity in labeled replication (IRFi)/ to the total DNA intensity of DAPI (IDNA total) gives the fraction of DNA present in each foci compared to the whole nucleus. This ratio was then multiplied by the genome size (Kbp) of HeLa Kyoto cells which was measured and published in Chagin et al. (2016; DOI:10.1038/ncomms11231). This gives us the approximate amount of DNA present in each labeled replication foci in Kbp. Since the genome duplicates over cell cycle stages, the measured DNA content in IRFi was corrected to the cell cycle stage (determined by PCNA) by multiplying the cell cycle correction factor.

      3) HeLa cells are known to be highly heterogeneous and heavily aneuploidy. Cells in one sample have different numbers of chromosomes ranging from 50 - 80. Therefore, GS (genome size) for each cell should not be the same. Using one constant GS in the equation for every cell introduces errors. Has the cell-to-cell variation been considered and corrected in the data? If not, the authors should provide information regarding cell-to-cell variations, such as the intensity variation of nuclear DAPI signals in synchronized cells.

      It is true that the HeLa genome is aneuploid. However, the heterogeneity of the genome is true, if one compares different HeLa strains as studied in Frattini et al. (2015; DOI:10.1038/srep15377), where they show the variability of genome and RNA expression profiles and small genomic rearrangements among different HeLa strains. However, to our knowledge, it is not studied extensively or shown whether the heterogeneity and aneuploidy would also be a cell to cell variation. Therefore, we performed a control experiment to verify the variability between HeLa Kyoto cells, where we either synchronized or not and stained with DAPI and the DNA content profiles of all cells were plotted as a histogram (supplementary figure 1B) to show that cell to cell variations is not present and by synchronizing, we see that the cell population in G1, has similar DNA content showing that the cell to cell variability is negligible in our detection methods. Nonetheless, we have obtained data using normal diploid human fibroblasts, which validated our outcome.

      STABLE:

      Macville, Merryn, et al. "Comprehensive and definitive molecular cytogenetic characterization of HeLa cells by spectral karyotyping." Cancer research 59.1 (1999): 141-150.

      UNSTABLE:

      Liu, Yansheng, et al. "Multi-omic measurements of heterogeneity in HeLa cells across laboratories." Nature biotechnology 37.3 (2019): 314-322.

      Landry, Jonathan JM, et al. "The genomic and transcriptomic landscape of a HeLa cell line." G3: Genes, Genomes, Genetics 3.8 (2013): 1213-1224.

      4) The chromatin foci are in a variety of sizes and intensities. How were boundaries of foci determined? Weak foci were picked up in one image but not in another. This is a concern because the size of the chromatin domain could influence mobility measurement. The authors should provide control experiments or better explanations for detecting and selecting chromatin foci.

      The method for detecting chromatin foci is described in “Materials and Methods” section “Automated tracking of chromatin structures in time-lapse videos”. “Chromatin structures are detected by the spot-enhancing filter (SEF) (Sage et al., 2005; doi:10.1109/TIP.2005.852787) which consists of a Laplacian-of-Gaussian (LoG) filter followed by thresholding the filtered image and determination of local maxima. The threshold is automatically determined by the mean of the absolute values of the filtered image plus a factor times the standard deviation.” For reasons of consistency, we used the same threshold factor for all images of an image sequence. Therefore, depending on the intensity distribution in an image, it can happen that weak foci are not detected in some images. Alternatively, one could manually adapt the threshold factor for all single images, which, however, would be subjective. We now added the information that we used the same threshold factor for all images of an image sequence.

      5) In Figure 3, the authors combined MSD from G1 and G2 in one group. Has any published data suggested that chromatin dynamics are the same in G1 and G2?

      To clarify this we separated G1 and G2 mobility measurements in supplementary figure S2 and updated the figures and text accordingly.

      6) In Figure 3B, cytoplasmic CY3-dUTP foci are found in the G1/G2 and S images. Are these CY3-dUTP aggregates? If so, are they also found in the nucleus? What is the mobility of the cytoplasmic CY3-dUTP foci?

      These are aggregates and not found in the nucleus. These foci were excluded from the analysis by using a nuclear mask based on the PCNA signal. This information was added to the figure 3B legend.

      7) In Figure 4, how is colocalization defined? 1.8 um is approximately the size of a chromosome territory, which is much larger than 0.5 Mb. Two foci that are 1.8 um apart should not be considered in the same chromosome.

      We agree that colocalized would indeed mean that the signals are overlapping. Therefore, we updated the figures and text as center to center distance or proximity analysis.

      Minor comments:

      1) Figure 3D should be presented by a box and whisker plot. The histogram does not show an actual distribution of the data.

      The histograms shown in figure 3D is the average mean square displacement measurement value for different cell cycle stages. These are the same data shown in the table. Therefore, the histogram is removed and the table in figure 3C is retained.

      2) Please explain Figure 3C error bars in the figure legend. Are they SD?

      The error bars of the MSD curves (highlighted in bright color around the curves) in figure 3C show the standard error of the mean (SEM) representing the deviations between the MSD curves for an image sequence. We clarified this in the legend of Figure 3C.

      3) In Figure 5C, some western blotting results seem to be assembled from replicate experiments. Comparing signals from one experiment with the same background is suggested.

      We made sure that the western blots from the same replicates are cropped and the information is also added to the respective figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their thorough assessment of our study, and their acknowledgment of its strengths and weaknesses. We did our best below to address the weaknesses raised in their public review, and to comply with their recommendations.

      Reviewer #1 (Public Review):

      Segas et al. present a novel solution to an upper-limb control problem which is often neglected by academia. The problem the authors are trying to solve is how to control the multiple degrees of freedom of the lower arm to enable grasp in people with transhumeral limb loss. The proposed solution is a neural network based approach which uses information from the position of the arm along with contextual information which defines the position and orientation of the target in space. Experimental work is presented, based on virtual simulations and a telerobotic proof of concept

      The strength of this paper is that it proposes a method of control for people with transhumeral limb loss which does not rely upon additional surgical intervention to enable grasping objects in the local environment. A challenge the work faces is that it can be argued that a great many problems in upper limb prosthesis control can be solved given precise knowledge of the object to be grasped, its relative position in 3D space and its orientation. It is difficult to know how directly results obtained in a virtual environment will translate to real world impact. Some of the comparisons made in the paper are to physical systems which attempt to solve the same problem. It is important to note that real world prosthesis control introduces numerous challenges which do not exist in virtual spaces or in teleoperation robotics.

      We agree that the precise knowledge of the object to grasp is an issue for real world application, and that real world prosthesis control introduces many challenges not addressed in our experiments. Those were initially discussed in a dedicated section of the discussion (‘Perspectives for daily-life applications’), and we have amended this section to integrate comments by reviewers that relate to those issues (cf below).

      The authors claim that the movement times obtained using their virtual system, and a teleoperation proof of concept demonstration, are comparable to natural movement times. The speed of movements obtained and presented are easier to understand by viewing the supplementary materials prior to reading the paper. The position of the upper arm and a given target are used as input to a classifier, which determines the positions of the lower arm, wrist and the end effector. The state of the virtual shoulder in the pick and place task is quite dynamic and includes humeral rotations which would be challenging to engineer in a real physical prosthesis above the elbow. Another question related to the pick and place task used is whether or not there are cases where both the pick position and the place position can be reached via the same, or very similar, shoulder positions? i.e. with the shoulder flexion-extension and abduction-adduction remaining fixed, can the ANN use the remaining five joint angles to solve the movement problem with little to no participant input, simply based on the new target position? If this was the case, movements times in the virtual space would present a very different distribution to natural movements, while the mean values could be similar. The arguments made in the paper could be supported by including individual participant data showing distributions of movement times and the distances travelled by the end effector where real movements are compared to those made by an ANN.

      In the proposed approach users control where the hand is in space via the shoulder. The position of the upper arm and a given target are used as input to a classifier, which determines the positions of the lower arm, wrist and the effector. The supplementary materials suggest the output of the classifier occurs instantaneously, in that from the start of the trial the user can explore the 3D space associated with the shoulder in order to reach the object. When the object is reached a visual indicator appears. In a virtual space this feedback will allow rapid exploration of different end effector positions which may contribute to the movement times presented. In a real world application, movement of a distal end-effector via the shoulder is not to be as graceful and a speed accuracy trade off would be necessary to ensure objects are grasped, rather than knocked or moved.

      As correctly noted by the reviewer and easily visible on videos, the distal joints predicted by the ANN are realized instantaneously in the virtual arm avatar, and a discontinuity occurs at each target change whereby the distal part of the arm jumps to the novel prediction associated with the new target location. As also correctly noted by the reviewer, there are indeed some instances where minimal shoulder movements are required to reach a new target, which in practice implies that on those instances, the distal part of the arm avatar jumps instantaneously close to the new target as soon as this target appears. Please note that we originally used median rather than mean movement times per participant precisely to remain unaffected by potential outliers that might come from this or other situations. We nevertheless followed the reviewer’s advice and have now also included individual distributions of movement times for each condition and participant (cf Supplementary Fig. 2 to 4 for individual distributions of movement time for Exp1 to 3, respectively). Visual inspection of those indicates that despite slight differences between participants, no specific pattern emerges, with distributions of movement times that are quite similar between conditions when data from all participants are pooled together.

      Movement times analysis indicates therefore that the overall participants’ behavior has not been impacted by the instantaneous jump in the predicted arm positions at each of the target changes. Yet, those jumps indicate that our proposed solution does not satisfactorily reproduce movement trajectory, which has implications for application in the physical world. Although we introduced a 0.75 s period before the beginning of each trial for the robotic arm to smoothly reach the first prediction from the ANN in our POC experiment (cf Methods), this would not be practical for a real-life scenario with a sequence of movements toward different goals. Future developments are therefore needed to better account for movement trajectories. We are now addressing this explicitly in the manuscript, with the following paragraph added in the discussion (section ‘Perspectives of daily-life applications’):

      “Although our approach enabled participants to converge to the correct position and orientation to grasp simple objects with movement times similar to those of natural movements, it is important to note that further developments are needed to produce natural trajectories compatible with real-world applications. As easily visible on supplementary videos 2 to 4, the distal joints predicted by the ANN are realized instantaneously such that a discontinuity occurs at each target change, whereby the distal part of the arm jumps to the novel prediction associated with the new target location. We circumvented problems associated with this discontinuity on our physical proof of concept by introducing a period before the beginning of each trial for the robotic arm to smoothly reach the first prediction from the ANN. This issue, however, needs to be better handled for real-life scenarios where a user will perform sequences of movements toward different objects.”

      Another aspect of the movement times presented which is of note, although it is not necessarily incorrect, is that the virtual prosthesis performance is close too perfect. In that, at the start of each trial period, either pick or place, the ANN appears to have already selected the position of the five joints it controls, leaving the user to position the upper arm such that the end effector reaches the target. This type of classification is achievable given a single object type to grasp and a limited number of orientations, however scaling this approach to work robustly in a real world environment will necessitate solving a number of challenges in machine learning and in particular computer vision which are not trivial in nature. On this topic, it is also important to note that, while very elegant, the teleoperation proof of concept of movement based control does not seem to feature a similar range of object distance from the user as the virtual environment. This would have been interesting to see and I look forward to seeing further real world demonstrations in the authors future work.

      According to this comment, the reviewer has the impression that the ANN had already selected a position of the five joints it controls at the start of each trial, and maintained those fixed while the user operates the upper arm so as to reach the target. Although the jumps at target changes discussed in the previous comment might give this impression, and although this would be the case should we have used an ANN trained with contextual information only, it is important to stress that our control does take shoulder angles as inputs, and produced therefore changes in the predicted distal angles as the shoulder moves.

      To substantiate this, we provide in Author response image 1 the range of motion (angular difference at each joint between the beginning and the end of each trial) of the five distal arm angles, regrouped for all angles and trials of Exp1 to 3 (one circle and line per participant, representing the median of all data obtained by that participant in the given experiment and condition, as in Fig. 3 of the manuscript). Please note that those ranges of motion were computed on each trial just after the target changes (i.e., after the jumps) for conditions with prosthesis control, and that the percentage noted on the figure below those conditions correspond to the proportion of the range of motion obtained in the natural movement condition. As can be seen, distal angles were solicited in all prosthesis control conditions by more than half the amount they moved in the condition of natural movements (between 54 and 75% depending on conditions).

      Author response image 1.

      With respect to the last part of this comment, we agree that scaling this approach to work robustly in a real world environment will necessitate solving a number of challenges in machine learning and in particular computer vision. We address those in a specific section of the discussion (‘Perspectives for daily-life application’) which has been further amended in response to the reviewers’ comments. As also mentioned earlier and at the occasion of our reply to other reviewers’ comments, we also agree that our physical proof of concept is quite preliminary, and we are looking forward to conduct future work in order to solve some of the issues discussed and get closer to real world demonstrations.

      Reviewer #2 (Public Review):

      Segas et al motivate their work by indicating that none of the existing myoelectric solution for people with transhumeral limb difference offer four active degrees of freedom, namely forearm flexion/extension, forearm supination/pronation, wrist flexion/extension, and wrist radial/ulnar deviation. These degrees of freedom are essential for positioning the prosthesis in the correct plan in the space before a grasp can be selected. They offer a controller based on the movement of the stump.

      The proposed solution is elegant for what it is trying to achieve in a laboratory setting. Using a simple neural network to estimate the arm position is an interesting approach, despite the limitations/challenges that the approach suffers from, namely, the availability of prosthetic hardware that offers such functionality, information about the target and the noise in estimation if computer vision methods are used. Segas et al indicate these challenges in the manuscript, although they could also briefly discuss how they foresee the method could be expanded to enable a grasp command beyond the proximity between the end-point and the target. Indeed, it would be interesting to see how these methods can be generalise to more than one grasp.

      Indeed, we have already indicated those challenges in the manuscript, including the limitation that our control “is suitable to place the hand at a correct position and orientation to grasp objects in a wide workspace, but not for fine hand and grasp control ...” (cf 4th paragraph of the ‘Perspectives for daily-life applications’ section of the discussion). We have nevertheless added the following sentence at the end of this paragraph to stress that our control could be combined with recently documented solutions for multiple grasp functions: “Our movement-based approach could also be combined with semi-autonomous grasp control to accommodate for multiple grasp functions39,42,44.”

      One bit of the results that is missing in the paper is the results during the familiarisation block. If the methods in "intuitive" I would have thought no familiarisation would be needed. Do participants show any sign of motor adaptation during the familiarisation block?

      Please note that the familiarization block indicated Fig. 3a contains approximately half of the trials of the subsequent initial acquisition block (about 150 trials, which represents about 3 minutes of practice once the task is understood and proficiently executed), and that those were designed to familiarize participants with the VR setup and the task rather than with the prosthesis controls. Indeed, it is important that participants were made familiar with the setup and the task before they started the initial acquisition used to collect their natural movements. In Exp1 and 2, there was therefore no familiarization to the prosthesis controls whatsoever (and thus no possible adaptation associated with it) before participants used them for the very first time in the blocks dedicated to test them. This is slightly different in Exp3, where participants with an amputated arm were first tested on their amputated side with our generic control. Although slight adaptation to the prosthesis control might indeed have occurred during those familiarization trials, this would be difficult in practice to separate from the intended familiarization to the task itself, which was deemed necessary for that experiment as well. In the end, we believe that this had little impact on our data since that experiment produced behavioral results comparable to those of Exp1 and 2, where no familiarization to the prosthesis controls could have occurred.

      In Supplementary Videos 3 and 4, how would the authors explain the jerky movement of the virtual arm while the stump is stationary? How would be possible to distinguish the relative importance of the target information versus body posture in the estimation of the arm position? This does not seem to be easy/clear to address beyond looking at the weights in the neural network.

      As discussed in our response to Reviewer1 and now explicitly addressed in the manuscript, there is a discontinuity in our control, whereby the distal joints of the arm avatar jumps instantaneously to the new prediction at each target change at the beginning of a trial, before being updated online as a function of ongoing shoulder movements for the rest of that trial. In a sense, this discontinuity directly reflects the influence of the target information in the estimation of the distal arm posture. Yet, as also discussed in our reply to R1, the influence of proximal body posture (i.e., Shoulder movements) is made evident by substantial movements of the predicted distal joints after the initial jumps occurring at each target change. Although those features demonstrate that both target information and proximal body posture were involved in our control, they do not establish their relative importance. While offline computation could be thought to quantify their relative implication in the estimation of the distal arm posture, we believe that further human-in-the-loop experiments with selective manipulation of this implication would be necessary to establish how this might affect the system controllability.

      I am intrigued by how the Generic ANN model has been trained, i.e. with the use of the forward kinematics to remap the measurement. I would have taught an easier approach would have been to create an Own model with the native arm of the person with the limb loss, as all your participants are unilateral (as per Table 1). Alternatively, one would have assumed that your common model from all participants would just need to be 'recalibrated' to a few examples of the data from people with limb difference, i.e. few shot calibration methods.

      AR: Although we could indeed have created an Own model with the native arm of each participant with a limb loss, the intention was to design a control that would involve minimal to no data acquisition at all, and more importantly, that could also accommodate bilateral limb loss. Indeed, few shot calibration methods would be a good alternative involving minimal data acquisition, but this would not work on participants with bilateral limb loss.

      Reviewer #3 (Public Review):

      This work provides a new approach to simultaneously control elbow and wrist degrees of freedom using movement based inputs, and demonstrate performance in a virtual reality environment. The work is also demonstrated using a proof-of-concept physical system. This control algorithm is in contrast to prior approaches which electrophysiological signals, such as EMG, which do have limitations as described by the authors. In this work, the movements of proximal joints (eg shoulder), which generally remain under voluntary control after limb amputation, are used as input to neural networks to predict limb orientation. The results are tested by several participants within a virtual environment, and preliminary demonstrated using a physical device, albeit without it being physically attached to the user.

      Strengths:

      Overall, the work has several interesting aspects. Perhaps the most interesting aspect of the work is that the approach worked well without requiring user calibration, meaning that users could use pre-trained networks to complete the tasks as requested. This could provide important benefits, and if successfully incorporated into a physical prosthesis allow the user to focus on completing functional tasks immediately. The work was also tested with a reasonable number of subjects, including those with limb-loss. Even with the limitations (see below) the approach could be used to help complete meaningful functional activities of daily living that require semi-consistent movements, such as feeding and grooming.

      Weaknesses:

      While interesting, the work does have several limitations. In this reviewer's opinion, main limitations are: the number of 'movements' or tasks that would be required to train a controller that generalized across more tasks and limbpostures. The authors did a nice job spanning the workspace, but the unconstrained nature of reaches could make restoring additional activities problematic. This remains to be tested.

      We agree and have partly addressed this in the first paragraph of the ‘Perspective for daily life applications’ section of the discussion, where we expand on control options that might complement our approach in order to deal with an object after it has been reached. We have now amended this section to explicitly stress that generalization to multiple tasks including more constrained reaches will require future work: “It remains that generalizing our approach to multiple tasks including more constrained reaches will require future work. For instance, once an intended object has been successfully reached or grasped, what to do with it will still require more than computer vision and gaze information to be efficiently controlled. One approach is to complement the control scheme with subsidiary movements, such as shoulder elevation to bring the hand closer to the body or sternoclavicular protraction to control hand closing26, or even movement of a different limb (e.g., a foot45). Another approach is to control the prosthesis with body movements naturally occurring when compensating for an improperly controlled prosthesis configuration46.”

      The weight of a device attached to a user will impact the shoulder movements that can be reliably generated. Testing with a physical prosthesis will need to ensure that the full desired workspace can be obtained when the limb is attached, and if not, then a procedure to scale inputs will need to be refined.

      We agree and have now explicitly included this limitation and perspective to our discussion, by adding a sentence when discussing possible combination with osseointegration: “Combining those with osseointegration at humeral level3,4 would be particularly relevant as this would also restore amplitude and control over shoulder movements, which are essential for our control but greatly affected with conventional residual limb fitting harness and sockets. Yet, testing with a physical prosthesis will need to ensure that the full desired workspace can be obtained with the weight of the attached device, and if not, a procedure to scale inputs will need to be refined.”

      The reliance on target position is a complicating factor in deploying this technology. It would be interesting to see what performance may be achieved by simply using the input target positions to the controller and exclude the joint angles from the tracking devices (eg train with the target positions as input to the network to predict the desired angles).

      Indeed, the reliance on precise pose estimation from computer vision is a complicating factor in deploying this technology, despite progress in this area which we now discuss in the first paragraph of the ‘Perspective for daily life applications’ section of the discussion. Although we are unsure what precise configuration of input/output the reviewer has in mind, part of our future work along this line is indeed explicitly dedicated to explore various sets of input/output that could enable coping with availability and reliability issues associated with real-life settings.

      Treating the humeral rotation degree of freedom is tricky, but for some subjects, such as those with OI, this would not be as large of an issue. Otherwise, the device would be constructed that allowed this movement.

      We partly address this when referring to osseointegration in the discussion: “Combining those with osseointegration at humeral level3,4 would be particularly relevant as this would also restore amplitude and control over shoulder movements, which are essential for our control but greatly affected with conventional residual limb fitting harness and sockets.” Yet, despite the fact that our approach proved efficient in reconstructing the required humeral angle, it is true that realizing it on a prosthesis without OI is an open issue.

      Overall, this is an interesting preliminary study with some interesting aspects. Care must be taken to systematically evaluate the method to ensure clinical impact.

      Reviewer #1 (Recommendations For The Authors):

      Page 2: Sentence beginning: "Here, we unleash this movement-based approach by ...". The approach presented utilises 3D information of object position. Please could the authors clarify whether or not the computer vision references listed are able to provide precise 3D localisation of objects?

      While the references initially cited in this sentence do support the view that movement goals could be made available in the context of prosthesis control through computer vision combined with gaze information, it is true that they do not provide the precise position and orientation (I.e., 6d pose estimation) necessary for our movementbased control approach. Six-dimensional object pose estimation is nevertheless a very active area of computer vision that has applications beyond prosthesis control, and we have now added to this sentence two references illustrating recent progress in this research area (cf. references 30 and 31).

      Page 6: Sentence beginning: "The volume spread by the shoulder's trajectory ...".

      • Page 7: Sentence beginning: "With respect to the volume spread by the shoulder during the Test phases ...".

      • Page 7: Sentence beginning: "Movement times with our movement-based control were also in the same range as in previous experiments, and were even smaller by the second block of intuitive control ...".

      On the shoulder volume presented in Figure 3d. My interpretation of the increased shoulder volume in Figure 3D Expt 2 shown in the Generic ANN was that slightly more exploration of the upper arm space was necessary (as related to the point in the public review). Is this what the authors mean by the action not being as intuitive? Does the reduction in movement time between TestGeneric1 and TestGeneric 2 not suggest that some degree of exploration and learning of the solution space is taking place?

      Indeed, the slightly increased shoulder volume with the Generic ANN in Exp2 could be interpreted as a sign that slightly more exploration of the upper arm space was necessary. At present, we do not relate this to intuitiveness in the manuscript. And yes, we agree that the reduction in movement time between TestGeneric1 and TestGeneric 2 could suggest some degree of exploration and learning.

      Page 7: Sentence beginning: "As we now dispose of an intuitive control ...". I think dispose may be a false friend in this context!

      This has been replaced by “As we now have an intuitive control…”.

      Page 8: Section beginning "Physical Proof of Concept on a tele-operated robotic platform". I assume this section has been added based on suggestions from a previous review. Although an elegant PoC the task presented in the diagram appears to differ from the virtual task in that all the targets are at a relatively fixed distance from the robot. In respect to the computer vision ML requirements, this does not appear to require precise information about the distance between the user and an object. Please could this be clarified?

      Indeed, the Physical Proof of Concept has been added after the original submission in order to comply with requests formulated at the editorial stage for the paper to be sent for review. Although preliminary and suffering from several limitations (amongst which a reduced workspace and number of trials as compared to the VR experiments), this POC is a first step toward realizing this control in the physical world. Please note that as indicated in the methods, the target varied in depth by about 10 cm, and their position and orientation were set with sensors at the beginning of each block instead of being determined from computer vision (cf section ‘Physical Proof of Concept’ in the ‘Methods’: “The position and orientation of each sponge were set at the beginning of each block using a supplementary sensor. Targets could be vertical or tilted at 45 and -45° on the frontal plane, and varied in depth by about 10 cm.”).

      Page 10: Sentence beginning: "This is ahead of other control solutions that have been proposed ...". I am not sure what this sentence is supposed to convey and no references are provided. While the methods presented appear to be a viable solution for a group of upper-limb amputees who are often ignored by academic research, I am not sure it is appropriate for the authors to compare the results obtained in VR and via teleoperation to existing physical systems (without references it is difficult to understand what comparison is being made here).

      The primary purpose of this sentence is to convey that our approach is ahead of other control solutions proposed so far to solve the particular problem as defined earlier in this paragraph (“Yet, controlling the numerous joints of a prosthetic arm necessary to place the hand at a correct position and orientation to grasp objects remains challenging, and is essentially unresolved”), and as documented to the best we could in the introduction. We believe this to be true and to be the main justification for this publication. The reviewer’s comment is probably directed toward the second part of this sentence, which states that performances of previously proposed control solutions (whether physical or in VR) are rarely compared to that of natural movements, as this comparison would be quite unfavorable to them. We soften that statement by removing the last reference to unfavorable comparison, but maintained it as we believe it is reflecting a reality that is worth mentioning. Please note that after this initial paragraph, and an exposition of the critical features of our control, most of the discussion (about 2/3) is dedicated to limitations and perspectives for daily-life application.

      Page 10: Sentence: "Here, we overcame all those limitations." Again, the language here appears to directly compare success in a virtual environment with the current state of the art of physical systems. Although the limitations were realised in a virtual environment and a teleoperation PoC, a physical implementation of the proposed system would depend on advances in machine vision to include movement goal. It could be argued that limitations have been traded, rather immediately overcome.

      In this sentence, “all those limitations” refers to all three limitations mentioned in the previous sentences in relation to our previous study which we cited in that sentence (Mick et al., JNER 2021), rather than to limitations of the current state of the art of physical systems. To make this more explicit, we have now changed this sentence to “Here, we overcome those three limitations”.

      Page 11: Sentence beginning: "Yet, impressive progresses in artificial intelligence and computer vision ...".

      • Page 11: Sentence beginning: "Prosthesis control strategies based on computer vision ..."

      The science behind self-driving cars is arguably of comparable computational complexity to the real-world object detection and with concurrent real-time grasp selection. The market for self-driving cars is huge and a great deal of R&D has been funded, yet they are not yet available. The market for advanced upper-limb prosthetics is very small, it is difficult to understand who would deliver this work.

      We agree that the market for self-driving cars is much higher than that for advanced upper-limb prosthetics. Yet, as mentioned in our reply to a previous comment, 6D object pose estimation is a very active area of computer vision that has applications far beyond prosthesis control (cf. in robotics and augmented reality). We have added two references reflecting recent progress in this area in the introduction, and have amended the discussion accordingly: “Yet, impressive progress in artificial intelligence and computer vision is such that what would have been difficult to imagine a decade ago appears now well within grasp38. For instance, we showed recently that deep learning combined with gaze information enables identifying an object that is about to be grasped from an egocentric view on glasses33, and this even in complex cluttered natural environments34. Six-dimensional object pose estimation is also a very active area of computer vision30,31, and prosthesis control strategies based on computer vision combined with gaze and/or myoelectric control for movement intention detection are quickly developing39–44, illustrating the promises of this approach.”

      Page 15: Sentence beginning: "From this recording, 7 signals were extracted and fed to the ANN as inputs: ...".

      • Page 15: Sentence beginning: "Accordingly, the contextual information provided as input corresponded to the ...".

      The two sentences appear to contradict one another and it is difficult to understand what the Own ANN was trained on. If the position and the orientation of the object were not used due to overfitting, why claim that they were used as contextual information? Training on the position and orientation of the hand when solving the problem would not normally be considered contextual information, the hand is not part of the environment or setting, it is part of the user. Please could this section be made a little bit clearer?

      The Own ANN was trained using the position and the orientation of a hypothetic target located within the hand at any given time. This approach has been implemented to increase the amount of available data. However, when the ANN is utilized to predict the distal part of the virtual arm, the position and orientation of the current target are provided. We acknowledge that the phrasing could be misleading, so we have added the following clarification to the first sentence: "… (3 Cartesian coordinates and 2 spherical angles that define the position and orientation of the hand as if a hypothetical cylindrical target was placed in it at any time, see an explanation for this choice in the next paragraph)".

      Page 16: Sentence beginning: "A trial refers to only one part of this process: either ...". Would be possible to present these values separately?

      Although it would be possible to present our results separately for the pick phase and for the place phase, we believe that this would overload the manuscript for little to no gain. Indeed, nothing differentiates those two phases other than the fact that the bottle is on the platform (waiting to be picked) in the pick phase, and in the hand (waiting to be placed) in the place phase. We therefore expect to have very similar results for the pick phase and for the place phase, which we verified as follows on Movement Time: Author response image 2 shows movement time results separated for the pick phase (a) and for the place phase (b), together with the median (red dotted line) obtained when results from both phases are polled together. As illustrated, results are very similar for both phases, and similar to those currently presented in the manuscript with both phases pooled (Fig3C).

      Author response image 2.

      Page 19: Sentence beginning "The remaining targets spanned a roughly ...". Figure 2 is a very nice diagram but it could be enhanced with a simple visual representation of this hemispherical region on the vertical and horizontal planes.

      We made a few attempts at enhancing this figure as suggested. However, the resulting figures tended to be overloaded and were not conclusive, so we opted to keep the original.

      Page 19: Sentence beginning "The Movement Time (MT) ..."

      • Page 19: Sentence beginning "The shoulder position Spread Volume (SV) ..." Would it be possible to include a traditional timing protocol somewhere in the manuscript so that readers can see the periods over which these measures calculated?

      We have now included Fig. 5 to illustrate the timing protocol and the periods over which MT and SV were computed.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments

      Page 6: "Yet, this control is inapplicable "as is" to amputees, for which recording ..." -> "Yet, this control is inapplicable "as is" to amputees, for WHOM recording ... "

      This has been modified as indicated.

      Throughout: "amputee" -> "people with limb loss" also "individual with limb deficiency" -> "individual with limb difference"

      We have modified throughout as indicated.

      It would have been great to see a few videos from the tele-operation as well. Please could you supply these videos?

      Although we agree that videos of our Physical Proof of Concept would have been useful, we unfortunately did not collect videos that would be suitable for this purpose during those experimental phases. Please note that this Physical Proof of Concept was not meant to be published originally, but has been added after the original submission in order to comply with requests formulated at the editorial stage for the paper to be sent for review.

      Reviewer #3 (Recommendations For The Authors):

      Consider using the terms: intact-limb rather than able-bodied, residual limb rather than stump, congenital limb different rather than congenital limb deficiency.

      We have modified throughout as indicated.

    1. Author Response

      The following is the authors’ response to the original reviews.

      REVIEWER #1:

      The authors present a carefully controlled set of experiments that demonstrate an additional complexity for GPCR signaling in that endosomal signaling make be different when b-arrestin is or isn't associated with a G protein-bound V2R vasopressin receptor. It uses state of the art biosensorbased approaches and b-arrestin KO lines to assess this. It adds to a growing body of evidence that G proteins and b-arrestin can associate with GPCR complexes simultaneously. They also demonstrate the possibility that Gaq might also be activated by the V2R receptor. My sense is one thing they may need to be considered is the possibility of such "megacomplexes" might actually involve receptor dimers or oligomers.

      1.1 Can the authors please review the data that describes the concept of "GPCR megacomplexes"? I feel this is missing from the introduction. The notion means different things to different people. As you will see from my other comments, you should especially focus on evidence at the level of the single receptor.

      We appreciate the reviewer’s comments and have now included a more wholesome description of the GPCR megacomplex, or ‘megaplex’, concept in the introduction (page 2, 1st paragraph).

      1.2 The authors use mini-G proteins to conclude that V2R receptors interact with Gaq (in addition to Gas). I would prefer if there were a more direct measure of this. Can the authors show that the receptor interacts with full length Gaq (and not the other G proteins in Figure)? Is there a signaling phenotype associated with Gaq coupling? Is it sensitive to Gaq inhibition?

      Excellent point and we are happy to expand further on this. The ability of the V2R to activate Gq/11 has already been demonstrated before (Zhu, X. et al. Mol Pharmacol 46(3):460-9 (1994); Lykke, K. et al. Physiol Rep. 3(8):e12519 (2015); Avet, C. et al. eLife 11: e74101 (2022); Heydenreich, F.M. et al. Mol Pharmacol 102(3):139-49 (2022). Therefore, we did not attempt to document this activation using more traditional assays. On the other hand, to demonstrate an interaction between V2R and Ga subunit in cells is challenging for several reasons. First, the full-length Ga subunit is already located at the plasma membrane at basal state, and thus, generates high background signals in proximity assays. Second, upon receptor activation, the Ga subunit interaction with V2R is so transient that it is difficult, if not impossible, to catch this transient moment in a proximity assay. Although the miniG proteins are highly engineered, coupling specificity of the different subtypes (Gas, Gai/o, Gaq/11, and Ga12/13) to GPCRs is maintained. In addition, as they are homogenously expressed in the cytosol under basal states rather than at the membrane, they generate low background noise. Upon agonist stimulation, miniG proteins are recruited from the cytosol to the V2R at the plasma membrane, resulting in a robust signal in proximity assays. Thus, miniG proteins are unique in that they can actually detect GPCR–G protein interactions in cellular proximity assays, which is very challenging using full-length Ga subunits.

      That being said, we fully understand the reviewer’s concern and greatly value the effort in enhancing robustness of our study. Therefore, we have now monitored downstream signaling events of Gaq/11 in the absence or presence of the selective Gaq/11 inhibitor YM-254890 as a secondary method of documenting Gaq/11 activity. Specifically, we used a newly developed biosensor to measure diacylglycerol (DAG) production, a downstream second messenger of Gaq/11 activation, at both the plasma membrane and endosomes. Using a second biosensor, we detect general protein kinase C (PKC) activation, which is another downstream signaling event of Gaq/11 activation. Together, we demonstrated that AVP-stimulation leads to DAG production at both the plasma membrane and endosomes (Fig. 1C-D) as well as PKC activation (Fig. 1E), which all are sensitive to YM-254890 inhibition (Fig. 1C-D and E). Together these results rigorously suggest that the V2R interacts with and activates Gaq/11.

      1.3 I raise a similar concern with Gaq coupling in endosomes.

      For similar reasons that miniG proteins are excellent tools for demonstrating V2R interaction with G proteins at the plasma membrane, miniG proteins can also be used to detect V2R interaction with G proteins at endosomes by measuring proximity between miniG and an endosomal marker in response to agonist challenge. However, to ensure that the endosomal recruitment of miniGsq to the V2R demonstrated in our study corresponds to endosomal Gaq/11 activation, we monitored the production of DAG at the early endosomes in a similar way to which we detected DAG production at the plasma membrane. As shown in Fig. 1D, stimulation of V2R with AVP induces recruitment of the DAG-binding biosensor to the early endosomal marker Rab5. Pre-treatment of the cells with the selective Gaq/11 inhibitor YM-254890 abrogated this response, confirming that V2R activation leads to production of DAG at the early endosomes in a Gaq/11-dependent manner (Fig. 1D).

      1.4 Can the confocal data be shown for Gai and Ga12?

      Yes, we can certainly show this data as negative control. We have now included the confocal data using Halo-mGsi as a negative control for confocal microscopy (Fig. 2). As seen on this figure, mGsi does not colocalize with Lck (plasma membrane), nor with EEA1 (early endosomes) upon stimulation of cells with AVP in line with a receptor that does not couple to Gai/o.

      We did not include data using Halo-mG12, as this G protein subtype, similar to Gi/o, does not couple functionally to V2R. Therefore, it is highly unlikely we would obtain different results from the experiments using Halo-mGsi.

      1.5 The authors want us to believe that there is simultaneous binding of G proteins and b-arrestin. This is never demonstrated and is at odds with the structural basis of G protein and b-arrestin binding. Have the authors considered that "simultaneous" occupancy might simply reflect binding at distinct GPCR monomers in the context of dimeric or oligomeric receptors? They could I suppose provide data at the level of a single receptor rather than using the bulk BRET approaches used.

      We appreciate the comment and opportunity to highlight some of our previous work, which address the megacomplexes at the level of a single receptor. First, we have characterized the megacomplex biochemically and structurally at a low resolution (Thomsen ARB et al. 2016, Cell 166(4):907-19). The results unequivocally demonstrate that a single GPCR interacts simultaneously with heterotrimeric G protein, at the receptor core, and with b-arrestin via the phosphorylated receptor carboxy-terminal. We also documented functionality of the megacomplex as the receptor can interact with and activate the G protein, which were shown by 3 different biochemical approaches (Thomsen ARB et al. 2016, Cell 166(4):907-19). In addition, we solved a high-resolution cryo-EM structure of a megacomplex further highlighting the architecture of this complex (Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31). As both biochemical and structural analyses were done in vitro in which the receptor was embedded in a detergent micelle, we also confirmed that the megacomplex structural architecture fits naturally within the context of a membrane in molecular dynamics simulation experiments (Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31).

      In cells, we and others have also showed that GPCRs such as the V2R can bind b-arrestins exclusively via the phosphorylated carboxy-terminal tail as it does in the megacomplex (Kumari P et al. 2016, Nat Commun 7:13416; Cahill III TJ et al. 2017, PNAS 114(10):2562-67; Kumari P et al. 2017, Mol Biol Cell 28(8):1003-10; Chen K et al. 2023, Nature (online doi: https://doi.org/10.1038/s41586-023-06420-x). In addition, we and others have used BRET and confocal microscopy to show that the V2R and other GPCRs recruit G protein and b-arrestin simultaneously and that the three components colocalize in endosomes upon prolonged agonist exposure (Thomsen ARB et al. 2016, Cell 166(4):907-19; Chen K et al. 2023, Nature (online doi: https://doi.org/10.1038/s41586-023-06420-x). As the reviewer correctly points out, in these cellular experiments (as well as in single molecule microscopy), the working resolution is not high enough to rule out that the receptors that co-recruit G protein and b-arrestin in endosomes could be dimeric instead of monomeric. Thus, we conducted a series of experiments with GPCR–b-arrestin fusions where the two proteins are covalently attached at the receptor carboxy-terminal tail. We showed that despite the GPCR–b-arrestin coupling being fully functional (in respect to b-arrestin promoting a highaffinity state of the receptor for agonist binding and constitutively internalizing the receptor) the receptor could still activate G proteins (Thomsen ARB et al. 2016, Cell 166(4):907-19; Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31), which demonstrates that the single receptor megaplex can physically form in cells.

      We have now included an extra paragraph in the discussion to go over these megaplex-related considerations (5th paragraph in the discussion), and we thank the reviewer for raising this point.

      1.6 Please introduce abbreviations when you first use this- this was not done consistently.

      Thank you for noticing these errors, which we now have corrected.  

      REVIEWER #2:

      This manuscript by Daly et al., probes the emerging paradigm of GPCR signaling from endosomes using the V2R as a model system with an emphasis on Gaq/11 and b-arrestins. The study employs cellular imaging, enzyme complementation assays and energy transfer-based sensors to probe the potential formation of GPCR-G-protein-b-arrestin megaplexes. While the study is certainly very interesting, it appears to be very preliminary at many levels, and clearly requires further development in order to make robust conclusions. The authors should consider expanding on this work further to make the points more convincingly to make the work solid and impactful. The two corresponding authors are among the leaders in the field having demonstrated the existence of megaplexes, and building on the work in a systematic fashion should certainly move the paradigm forward. As the work presented in the current manuscript is already pre-printed, the authors should take this opportunity to present a completer and more comprehensive story to the field.

      We are grateful for the time and efforts the reviewer has put into reviewing our work. We are certainly excited to learn that the reviewer finds our work “very interesting”. Regarding the robustness, we have added extra control experiments to increase the completeness of the study. These experiments include:

      • Measurements of AVP-stimulated diacylglycerol production, a signaling event downstream of Gaq/11 activation. These measurements were conducted both at plasma membrane (Fig. 1C) and early endosomes (Fig. 1D) using a newly developed DAG-binding biosensor, and demonstrate that the V2R activates Gaq/11 at both of these subcellular locations.

      • Monitoring AVP-promoted protein kinase C activation, another downstream signaling effect of Gaq/11 activation (Fig. 1E). The result of this approach shows in another way that V2R activates of Gaq/11.

      • Inhibition of signaling events downstream of Gaq/11 activation using the selective of Gaq/11 inhibitor YM254890. YM-254890 inhibits both AVP-stimulated DAG production at plasma membrane and endosomes as well as PKC activation (Fig. 1C-E), which strongly confirms that these signaling outputs are results of Gaq/11 activation.

      • We have also included the confocal data using Halo-mGsi as a negative control for confocal microscopy (Fig. 2). As seen in this figure, mGsi does not translocate to the plasma membrane or early endosomes upon stimulation with AVP, which validates that V2R activation does not couple to and activate Gai/o.

      Finally, we would like to kindly remind the reviewer that the production of the pre-print manuscript is part of the peer-review process in eLife.

      2.1 The use of miniG proteins in these experiments is a major concern as these are highly engineered and may not represent the true features of G proteins. While these have been used as a readout in other publications, their use in demonstrating megaplex formation is sub-optimal, and native, full-length G proteins should be used.

      We are a bit unsure as to what the reviewer means by using native full-length G proteins. If the reviewer is suggesting to co-immunoprecipitate V2R with native unlabeled G protein and b-arrestin, it should be considered that the G protein interaction with the receptor is extremely transient and unlikely to survive the pull-down procedure unless stabilized by a nanobody or crosslinking. Although the b-arrestin interaction with the receptor is more stable of nature, co-immunoprecipitation with the receptor requires crosslinking or stabilization with a Fab/nanobody. Therefore, we do not think this approach can be used as a more accurate way of detecting native megaplexes.

      If the reviewer is suggesting the use of full-length G proteins in our cell-based proximity assays instead of miniG proteins, we would like to highlight that this approach is somewhat prone to false-positive responses. The major reason behind this is that G proteins are located at regions in membranes close to the receptor whereas b-arrestins are distributed throughout the cytosol. Upon activation of the V2R, barrestins translocate to the receptor at the plasma membrane, which results in enhanced BRET between V2R-coupled G protein subtypes and b-arrestins (see Author response image 1 below of preliminary data). This translocation also results in non-specific BRET signals between b-arrestins and G protein subtypes at the plasma membrane that do not couple to V2R but are located in close proximity to the receptor. As these nonspecific BRET signals do not report on the formation of functional V2R megaplexes (see Author response image 1), we have purposely not used this approach.

      Author response image 1.

      To overcome this technical hurdle in detection of functional megaplexes, we have replaced full-length G proteins by miniG proteins as the latter are located in the cytosol at resting states and only translocate to the membrane area if a receptor adopts an active conformation. This replacement is advantageous since activation of megaplex-forming receptors such as the V2R results in simultaneous translocation of miniG proteins and b-arrestins from the cytosol to the receptor at the plasma membrane, which produces a highly specific proximity signal (see Author response image 2 below of preliminary data). When stimulating the V2R, we only observe increases in proximity between b-arrestin1 and miniG proteins that are activated by the V2R (miniGs and miniGsq) but not the miniG proteins that are not activated by this receptor (miniGsi and miniG12) (see Author response image 2). Therefore, usage of miniG proteins offers a more accurate experimental approach to detect functional megaplexes as compared to the usage of full-length G proteins.

      Author response image 2.

      2.2 The interpretation of complementation (NanoLuc) or proximity (BRET) as evidence of signaling is not appropriate, especially when overexpression system and engineered constructs are being used.

      We thank the reviewer for raising this concern. We have previously demonstrated global Gas activation and Gas signaling in form of cAMP stimulated by internalized V2R (Thomsen ARB et al. 2016, Cell 166(4):907-19). As mentioned previously, in the current updated manuscript we have now included experiments to document downstream signaling events in response to Gaq/11 activation. These experiments include measurement of production of DAG at the plasma membrane (Fig. 1C) and early endosomes (Fig. 1D), as well as phosphorylation/activation of PKC (Fig. 1E). Pre-incubation with the selective Gaq/11 inhibitor YM-254890, abrogated all these downstream signals and confirms that the V2R stimulates Gaq/11 protein signaling at both the plasma membrane and endosomes (Fig. 1C-E).

      2.3 After the original work from the same corresponding authors on megaplex formation, the major challenge in the field is to demonstrate the existence and relevance of megaplex formation at endogenous levels of components, and the current study focuses solely on showing the proximity of Gaq and b-arrestins.

      We completely agree with the reviewer that it will be important to demonstrate functionality endogenous megaplexes and we are currently working on this in other studies using different receptor systems. However, doing this is not trivial and we will have to overcome major technical barriers that we feel is somewhat out of the scope of the current study. The goal of our V2R study is to demonstrate that V2R megaplexes form with Gaq/11 resulting to Gaq/11 activation at endosomes, and that endosomal G protein activation by the V2R can occur independently of b-arrestin, which we in our humble opinion accomplish.

      2.4 The study lacks a coherent approach, and the assays are often shifted back and forth between the two b-arrestin isoforms (1 and 2), for example, confocal vs. complementation etc.

      We understand the reviewer’s concern. However, as opposed to the β2-adrenergic receptor that binds βarrestin2 with higher affinity than β-arrestin1, V2R has a strong affinity for both β-arrestin1 and β-arrestin2 (Oakley et al. 2000, JBC 275(22):17201-10). The V2R’s almost identical affinity for β-arrestin1 and βarrestin2 is well illustrated in Fig. 3B. Thus, although different β-arrestin isoforms were used in some experiments, it is very unlikely that the overall results and conclusions from this study will change by adding extra experiments to ensure that both β-arrestin isoforms are used in every experiment.

      2.5 In every assay, only the G proteins and b-arrestins are monitored without a direct assessment of the presence of receptor, and absent that data, it is difficult to justify calling these entities megaplexes.

      Mini G proteins and b-arrestin come into close proximity upon agonist stimulation of the V2R. Using confocal microscopy, we observed this co-recruitment of miniGs/miniGsq and b-arrestin in response to prolonged V2R stimulation at endosomes specifically (Fig. 3D-F). In absence of GPCR stimulation, both miniG and b-arrestin would be homogenously distributed throughout the cytosol, and thus, the only reason to why both proteins have been recruited to endosomes in response to AVP challenge is that they are recruited to internalized and active V2R. This point was obviously not adequately described in the original manuscript, and thus, we have now clarified this further in the updated manuscript at the 8th sentence of the last paragraph of the "The V2R recruits Gas/Gaq and barrs simultaneously" section.

      REVIEWER #3:

      The manuscript by Daly et al. examines endosomal signaling of the vasopressin type 2 receptors using engineered mini G protein (mG proteins) and a number of novel techniques to address if sustained G protein signaling in the endosomal compartment is enhanced by b-arrestin. Employing these interesting techniques they have how V2R could activates Gas and Gaq in the endosomal compartments and how this modulation could occur in arrestin-dependent and -independent manner. Although the phenomenon of endosomal signaling is complex to address the authors have tried their best to examine these using a number of well controlled set of experiments. Though this is an interesting and well carried out study of endosomal signaling of G proteins, my concerns are:

      3.1 The study is done in overexpressed HEK 293 cells with these engineered constructs making me wonder if the kinetics would be the same in primary cells?

      The reviewer raises an interesting and valid point. It is possible that in the context of primary cells the kinetic would differ slightly and it would definitely be interesting to address this in a subsequent study. However, despite being an interesting aspect of our study, the kinetic itself is not our major take home message, but rather the subcellular localization of the G protein activation and the role of β-arrestin in these events. We have now highlighted this aspect in our updated manuscript (1st paragraph of the discussion) and we thank the reviewer for addressing this.

      3.2 The use of the phrase "G protein activation independent of b-arrestins to a minor degree" would make me question its physiological relevance. The authors should discuss the relevance of their findings in physiological or pathological context.

      We are glad that the reviewer focuses on this point, and we would like to highlight that other GPCRs including the glucagon-like peptide-1 receptor (GLP1R) internalizes in a β-arrestin-independent manner (Claing A et al. 2000 PNAS 97(3):1119-24), while signaling through Gas from endosomes. In the case of the GLP1R, this endosomal Gas signaling promotes glucose-stimulated insulin secretion in pancreatic βcells (Kuna RS et al. 2013 Am J Physiol Endocrinol Metab 305:E161-70). Consequently, β-arrestinindependent endosomal G protein signaling appears to have some physiological relevance. Similarly, in a very recent pre-print from the von Zastrow group (Blythe EE and von Zastrow M 2023 BioRxiv https://doi.org/10.1101/2022.09.07.506997), it was reported that endogenously-expressed vasoactive intestinal peptide receptor 1 (VIPR1), which regulates gastro-intestinal functions, promotes robust G protein signaling from endosomes in a completely β-arrestin-independent fashion. This again suggest that endogenously expressed GPCRs can internalize and activate G proteins from endosomes independently from β-arrestin to produce physiological responses. We have now discussed about these studies in the 6th paragraph of the discussion.

      3.3 The confocal colocalization studies shown in Figure 2 and their conclusion "suggesting a certain level of endosomal Gas/Gaq signaling despite the absence of barr2" seems rather inconclusive.

      As opposed to V2R a receptor that retains β-arrestin in endosomes upon internalization, β-arrestin quickly dissociates from V2b2AR after internalization due to the low affinity of the carboxy-terminal of β2AR for βarrestin. In the previous Fig. 2 (now Fig. 3), after 45 minutes of AVP stimulation, no β-arrestin is visible at endosomes in cells expressing V2b2AR as β-arrestin has already dissociated from the receptor and translocated back to the cytosol. However, clear green clusters of mGs and mGsq are still visible at endosomes indicating the presence of active receptor interacting with Gas or Gaq despite the fact that βarrestin is back to the cytosol. We quantified the percentage of the green mGs or mGsq clusters that do not colocalize with β-arrestin and have added this information to the updated version of the manuscript (Fig. 3G). In V2R-expressing cells, almost all active receptors that interact with Gas or Gaq/11 also associate with β-arrestin (Fig. 3G). In contrast, in V2b2AR-expressing cells, approximately 75% of the active receptors do not interact with β-arrestin (Fig. 3G). This suggests that β-arrestin binding to V2R is not an absolute requirement for endosomal Gas and Gaq activation by V2R. This point was obviously not addressed adequately in the original manuscript, and thus, we have now elaborated further on this in the updated version in the last paragraph of the "The V2R recruits Gas/Gaq and βarrs simultaneously" section.

      3.4 Though a novel observation it is not clear to me how V2R would internalize after activation without arrestin. Is it some sort of generalized microcytosis occurring in these overexpressed cells? Should discuss.

      This is certainly a very interesting observation and something other research laboratories also have seen recently – in particular, in context to endosomal G protein signaling (Blythe EE and von Zastrow M 2023 BioRxiv https://doi.org/10.1101/2022.09.07.506997). The main and best characterized pathway for GPCR internalization is clathrin-dependent where receptors most commonly are associated with β-arrestins. However, for some GPCRs, the β-arrestin association is not required for clathrin-mediated internalization. One example is the apelin receptor that can internalize via clathrin-coated pits, but in β-arrestinindependent manner (Pope GR et al. 2016 Moll Cell Endocrinol. 437:108-19). Alternatively, GPCRs can also internalize independently of any clathrin and β-arrestin associations via caveolae or fast endophilinmediated endocytosis (FEME). We have now expanded our discussion of possible mechanisms for βarrestin-independent receptor internalization in the updated manuscript in the 6th paragraph of the discussion, and we thank the reviewer for the suggestion.

      3.5 Is use of mini G protein a good representation? The authors should justify.

      Excellent point and something we have comprehensively discussed in our response to reviewer 1 and 2 (points 1.2 and 2.1).

    1. Author Response

      Reviewer #1 (Public Review):

      Like the "preceding" co-submitted paper, this is again a very strong and interesting paper in which the authors address a question that is raised by the finding in their co-submitted paper - how does one factor induce two different fates. The authors provide an extremely satisfying answer - only one subset of the cells neighbors a source of signaling cells that trigger that subset to adopt a specific fate. The signal here is Delta and the read-out is Notch, whose intracellular domain, in conjunction with, presumably, SuH cooperates with Bsh to distinguish L4 from L5 fate (L5 is not neighbored by signal-providing cells). Like the back-to-back paper, the data is rigorous, well-presented and presents important conclusions. There's a wealth of data on the different functions of Notch (with and without Bsh). All very satisfying.

      Thanks!

      I have again one suggestion that the authors may want to consider discussing. I'm wondering whether the open chromatin that the author convincingly measure is the CAUSE or the CONSEQUENCE of Bsh being able to activate L4 target genes. What I mean by this is that currently the authors seem to be focused on a somewhat sequential model where Notch signaling opens chromatin and this then enables Bsh to activate a specific set of target genes. But isn't it equally possible that the combined activity of Bsh/Notch(intra)/SuH opens chromatin? That's not a semantic/minor difference, it's a fundamentally different mechanism, I would think. This mechanism also solves the conundrum of specificity - how does Notch know which genes to "open" up? It would seem more intuitive to me to think that it's working together with Bsh to open up chromatin, with chromatin accessibility than being a "mere" secondary consequence. If I'm not overlooking something fundamental here, there is actually also a way to distinguish between these models - test chromatin accessibility in a Bsh mutant. If the author's model is true, chromatin accessibility should be unchanged.

      I again finish by commending the authors for this terrific piece of work.

      Thanks! It is a crucial question whether Notch signaling regulates chromatin landscape independently of a primary HDTF. We will include this discussion in the text and pursue it in our next project. We think Notch signaling may regulate chromatin accessibility independently of a primary HDTF based on our observation: in larval ventral nerve cord, all motor neurons are NotchON neurons while all sensory neurons are NotchOFF neurons; NotchON neurons share similar functional properties, despite expressing distinct HDTFs, possibly due to the common chromatin landscape regulated by Notch signaling.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors explore how Notch activity acts together with Bsh homeodomain transcription factors to establish L4 and L5 fates in the lamina of the visual system of Drosophila. They propose a model in which differential Notch activity generates different chromatin landscapes in presumptive L4 and L5, allowing the differential binding of the primary homeodomain TF Bsh (as described in the co-submitted paper), which in turn activates downstream genes specific to either neuronal type. The requirement of Notch for L4 vs. L5 fate is well supported, and complete transformation from one cell type into the other is observed when altering Notch activity. However, the role of Notch in creating differential chromatin landscapes is not directly demonstrated. It is only based on correlation, but it remains a plausible and intriguing hypothesis.

      Thanks for the positive feedback!

      Strengths:

      The authors are successful in characterizing the role of Notch to distinguish between L4 and L5 cell fates. They show that the Notch pathway is active in L4 but not in L5. They identify L1, the neuron adjacent to L4 as expressing the Delta ligand, therefore being the potential source for Notch activation in L4. Moreover, the manuscript shows molecular and morphological/connectivity transformations from one cell type into the other when Notch activity is manipulated.

      Thanks!

      Using DamID, the authors characterize the chromatin landscape of L4 and L5 neurons. They show that Bsh occupies distinct loci in each cell type. This supports their model that Bsh acts as a primary selector gene in L4/L5 that activates different target genes in L4 vs L5 based on the differential availability of open chromatin loci.

      Thanks!

      Overall, the manuscript presents an interesting example of how Notch activity cooperates with TF expression to generate diverging cell fates. Together with the accompanying paper, it helps thoroughly describe how lamina cell types L4 and L5 are specified and provides an interesting hypothesis for the role of Notch and Bsh in increasing neuronal diversity in the lamina during evolution.

      Thanks for the positive feedback on both manuscripts.

      Weaknesses:

      Differential Notch activity in L4 and L5:

      ● The manuscript focuses its attention on describing Notch activity in L4 vs L5 neurons. However, from the data presented, it is very likely that the pool of progenitors (LPCs) is already subdivided into at least two types of progenitors that will rise to L4 and L5, respectively. Evidence to support this is the activity of E(spl)-mɣ-GFP and the Dl puncta observed in the LPC region. Discussion should naturally follow that Notch-induced differences in L4/L5 might preexist L1-expressed Dl that affect newborn L4/L5. Therefore, the differences between L4 and L5 fates might be established earlier than discussed in the paper. The authors should acknowledge this possibility and discuss it in their model.

      We agree. Historically, LPCs are thought to be homogenous; our data suggests otherwise. We now emphasize this in the Discussion as requested. We are also investigating this question using single cell RNAseq on LPCs to look for molecular heterogeneities. Thanks for the great comment!

      ● The authors claim that Notch activation is caused by L1-expressed Delta. However, they use an LPC driver to knock down Dl. Dl-KD should be performed exclusively in L1, and the fate of L4 should be assessed.

      Dl is transiently expressed in newborn L1 neurons. To knock down Dl in L1, we need to express Dl-RNAi before Dl protein is expressed in newborn L1; the only known Gal4 line expressed that early is the LPC-Gal4 that we used. There is no L1-gal4 line expressed early enough to eliminate L1 expression of Dl.

      ● To test whether L4 neurons are derived from NotchON LPCs, I suggest performing MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter.

      We agree! Whether L4 neurons are derived from NotchON LPCs is a great question. However, MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter will not work because E(spl)-mɣ-GFP reporter is only expressed in LPCs but not lamina neurons. We now mention this in the Discussion.

      ● The expression of different Notch targets in LPCs and L4 neurons may be further explored. I suggest using different Notch-activity reporters (i.e., E(spl)-GFP reporters) to further characterize these. differences. What cause the switch in Notch target expression from LPCs to L4 neurons should be a topic of discussion.

      Thanks! It is a great question why Notch induces Espl-mɣ in LPCs but Hey in new-born neurons. However, it is not the question we are tackling in this paper and it will be a great direction to pursue in future. We will add this to our Discussion.

      Notch role in establishing L4 vs L5 fates:

      ● The authors describe that 27G05-Gal4 causes a partial Notch Gain of Function caused by its genomic location between Notch target genes. However, this is not further elaborated. The use of this driver is especially problematic when performing Notch KD, as many of the resulting neurons express Ap, and therefore have some features of L4 neurons. Therefore, Pdm3+/Ap+ cells should always be counted as intermediate L4/L5 fate (i.e., Fig3 E-J, Fig3-Sup2), irrespective of what the mechanistic explanation for Ap activation might be. It's not accurate to assume their L5 identity. In Fig4 intermediate-fate cells are correctly counted as such.

      Thanks for the comment! We will annotate Pdm3/Ap+ as L4/L5 fate in the corresponding figures.

      ● Lines 170-173: The temporal requirement for Notch activity in L5-to-L4 transformation is not clearly delineated. In Fig4-figure supplement 1D-E, it is not stated if the shift to 29{degree sign}C is performed as in Fig4-figure supplement 1A-C.

      Thank you for catching this. We will correct it in the text.

      ● Additionally, using the same approach, it would be interesting to explore the window of competence for Notch-induced L5-to-L4 transformation: at which point in L5 maturation can fate no longer be changed by Notch GoF?

      Our data show that Bsh with Notch signaling in newborn neurons specifies L4 fate while Bsh without Notch signaling in newborn neurons specifies L5 fate. Therefore, we think the window of fate competence is during newborn neurons. We will include the data to support this.

      L4-to-L3 conversion in the absence of Bsh

      ● Although interesting, the L4-to-L3 conversion in the absence of Bsh is never shown to be dependent on Notch activity. Importantly, L3 NotchON status is assumed based on their position next to Dl-expressing L1, but it is not empirically tested. Perhaps screening Notch target reporter expression in the lamina, as suggested above, could inform this issue.

      Our data show that the L4-to-L3 conversion in the absence of Bsh and in the presence of Notch activity while the L5-to-L1 conversion in the absence of Bsh and in the absence of Notch activity. Therefore, Notch activity is necessary for the L4-to-L3 conversion. Unfortunately, currently we only have Hey as an available Notch target reporter in new-born neurons. To tackle this challenge in the future, we will profile the genome-binding targets of endogenous Notch in newborn neurons. This will identify novel genes as Notch signaling reporters in neurons for the field.

      ● Otherwise, the analysis of Bsh Loss of Function in L4 might be better suited to be included in the accompanying manuscript that specifically deals with the role of Bsh as a selector gene for L4 and L5.

      That is an interesting suggestion, but without knowing that Bsh + Notch = L4 identity the experiment would be hard to interpret. Note that we took advantage of Notch signaling to trace the cell fate in the absence of Bsh and found the L4-to-L3 conversion (see Figure 5G-K).

      Different chromatin landscape in L4 and L5 neurons

      ● A major concern is that, although L4 and L5 neurons are shown to present different chromatin landscapes (as expected for different neuronal types), it is not demonstrated that this is caused by Notch activity. The paper proves unambiguously that Notch activity, in concert with Bsh, causes the fate choice between L4 and L5. However, that this is caused by Notch creating a differential chromatin landscape is based only in correlation. (NotchON cells having a different profile than NotchOFF). Although the authors are careful not to claim that differential chromatin opening is caused directly by Notch, this is heavily suggested throughout the text and must be toned down.e.g.: Line 294: "With Notch signaling, L4 neurons generate distinct open chromatin landscape" and Line 298: "Our findings propose a model that the unique combination of HDTF and open chromatin landscape (e.g. by Notch signaling)" . These claims are not supported well enough, and alternative hypotheses should be provided in the discussion. An alternative hypothesis could be that LPCs are already specified towards L4 and L5 fates. In this context, different early Bsh targets in each cell type could play a pioneer role generating a differential chromatin landscape.

      We agree and appreciate the comment, it is well justified. We have toned down our comments and clearly state that this is a correlation that needs to be tested for a causal relationship. Thank you for requesting it!

      ● The correlation between open chromatin and Bsh loci with Differentially Expressed genes is much higher for L4 than L5. It is not clear why this is the case, and should be discussed further by the authors.

      We agree, and think in L5 neurons, the secondary HDTF Pdm3 also contributes to L5 specific gene transcription during synaptogenesis window, in addition to Bsh. We will include this in the text.

    1. Author Response

      Reviewer #1 (Public Review):

      In this very strong and interesting paper the authors present a convincing series of experiments that reveal molecular mechanism of neuronal cell type diversification in the nervous system of Drosophila. The authors show that a homeodomain transcription factor, Bsh, fulfills several critical functions - repressing an alternative fate and inducing downstream homeodomain transcription factors with whom Bsh may collaborate to induce L4 and L5 fates (the author's accompanying paper reveals how Bsh can induce two distinct fates). The authors make elegant use of powerful genetic tools and an arsenal of satisfying cell identity markers.

      Thanks!

      I believe that this is an important study because it provides some fundamental insights into the conservation of neuronal diversification programs. It is very satisfying to see that similar organizational principles apply in different organisms to generate cell type diversity. The authors should also be commended for contextualizing their work very well, giving a broad, scholarly background to the problem of neuronal cell type diversification.

      Thanks!

      My one suggestion for the authors is to perhaps address in the Discussion (or experimentally address if they wish) how they reconcile that Bsh is on the one hand: (a) continuously expressed in L4/L4, (b) binding directly to a cohort of terminal effectors that are also continuously expressed but then, on the other hand, is not required for their maintaining L4 fate? A few questions: Is Bsh only NOT required for maintaining Ap expression or is it also NOT required for maintaining other terminal markers of L4? The former could be easily explained - Bsh simply kicks of Ap, Ap then autoregulates, but Bsh and Ap then continuously activate terminal effector genes. The second scenario would require a little more complex mechanism: Bsh binding of targets (with Notch) may open chromatin, but then once that's done, Bsh is no longer needed and Ap alone can continue to express genes. I feel that the authors should be at least discussing this. The postmitotic Bsh removal experiment in which they only checked Ap and depression of other markers is a little unsatisfying without further discussion (or experiments, such as testing terminal L4 markers). I hasten to add that this comment does not take away from my overall appreciation for the depth and quality of the data and the importance of their conclusions.

      Great suggestions, we will discuss these two hypotheses as requested.

      Bsh initiates Ap expression in L4 neurons which then maintain Ap expression independently of Bsh expression, likely through Ap autoregulation. During the synaptogenesis window, Ap expression becomes independent from Bsh expression, but Bsh and Ap are both still required to activate the synapse recognition molecule DIP-beta. Additionally, Bsh also shows putative binding to other L4 identity genes, e.g., those required for neurotransmitter choice, and electrophysiological properties, suggesting Bsh may initiates L4 identity genes as a suite of genes. The mechanism of maintaining identity features (e.g., morphology, synaptic connectivity and functional properties) in the adult remains poorly understood. It is a great question whether primary HDTF Bsh maintains the expression of L4 identity genes in the adult. To test this, in our next project, we will specifically knock out Bsh in L4 neurons of the adult fly and examine the effect on L4 morphology, connectivity and function properties.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors explore the role of the Homeodomain Transcription Factor Bsh in the specification of Lamina neuronal types in the optic lobe of Drosophila. Using the framework of terminal selector genes and compelling data, they investigate whether the same factor that establishes early cell identity is responsible for the acquisition of terminal features of the neuron (i.e., cell connectivity and synaptogenesis).

      Thanks for the positive words!

      The authors convincingly describe the sequential expression and activity of Bsh, termed here as 'primary HDTF', and of Ap in L4 or Pdm3 in L5 as 'secondary HDTFs' during the specification of these two neurons. The study demonstrates the requirement of Bsh to activate either Ap and Pdm3, and therefore to generate the L4 and L5 fates. Moreover, the authors show that in the absence of Bsh, L4 and L5 fates are transformed into a L1 or L3-like fates.

      Thanks!

      Finally, the authors used DamID and Bsh:DamID to profile the open chromatin signature and the Bsh binding sites in L4 neurons at the synaptogenesis stage. This allows the identification of putative Bsh target genes in L4, many of which were also found to be upregulated in L4 in a previous single-cell transcriptomic analysis. Among these genes, the paper focuses on Dip-β, a known regulator of L4 connectivity. They demonstrate that both Bsh and Ap are required for Dip-β, forming a feed-forward loop. Indeed, the loss of Bsh causes abnormal L4 synaptogenesis and therefore defects in several visual behaviors. The authors also propose the intriguing hypothesis that the expression of Bsh expanded the diversity of Lamina neurons from a 3 cell-type state to the current 5 cell-type state in the optic lobe.

      Thanks for the excellent summary of our findings!

      Strengths:

      Overall, this work presents a beautiful practical example of the framework of terminal selectors: Bsh acts hierarchically with Ap or Pdm3 to establish the L4 or L5 cell fates and, at least in L4, participates in the expression of terminal features of the neuron (i.e., synaptogenesis through Dip-β regulation).

      Thanks!

      The hierarchical interactions among Bsh and the activation of Ap and Pdm3 expression in L4 and L5, respectively, are well established experimentally. Using different genetic drivers, the authors show a window of competence during L4 neuron specification during which Bsh activates Ap expression. Later, as the neuron matures, Ap becomes independent of Bsh. This allows the authors to propose a coherent and well-supported model in which Bsh acts as a 'primary' selector that activates the expression of L4-specific (Ap) and L5-specific (Pdm3) 'secondary' selector genes, that together establish neuronal fate.

      Thanks again!

      Importantly, the authors describe a striking cell fate change when Bsh is knocked down from L4/L5 progenitor cells. In such cases, L1 and L3 neurons are generated at the expense of L4 and L5. The paper demonstrates that Bsh in L4/L5 represses Zfh1, which in turn acts as the primary selector for L1/L3 fates. These results point to a model where the acquisition of Bsh during evolution might have provided the grounds for the generation of new cell types, L4 and L5, expanding lamina neuronal diversity for a more refined visual behaviors in flies. This is an intriguing and novel hypothesis that should be tested from an evo-devo standpoint, for instance by identifying a species when L4 and L5 do not exist and/or Bsh is not expressed in L neurons.

      Thanks for the appreciation of our findings!

      To gain insight into how Bsh regulates neuronal fate and terminal features, the authors have profiled the open chromatin landscape and Bsh binding sites in L4 neurons at mid-pupation using the DamID technique. The paper describes a number of genes that have Bsh binding peaks in their regulatory regions and that are differentially expressed in L4 neurons, based on available scRNAseq data. Although the manuscript does not explore this candidate list in depth, many of these genes belong to classes that might explain terminal features of L4 neurons, such as neurotransmitter identity, neuropeptides or cytoskeletal regulators. Interestingly, one of these upregulated genes with a Bsh peak is Dip-β, an immunoglobulin superfamily protein that has been described by previous work from the author's lab to be relevant to establish L4 proper connectivity. This work proves that Bsh and Ap work in a feed-forward loop to regulate Dip-β expression, and therefore to establish normal L4 synapses. Furthermore, Bsh loss of function in L4 causes impairs visual behaviors.

      Thanks for the excellent summary of our findings.

      Weaknesses:

      ● The last paragraph of the introduction is written using rhetorical questions and does not read well. I suggest rewriting it in a more conventional direct style to improve readability.

      We agree, and will update the text as suggested.

      ● A significant concern is the way in which information is conveyed in the Figures. Throughout the paper, understanding of the experimental results is hindered by the lack of information in the Figure headers. Specifically, the genetic driver used for each panel should be adequately noted, together with the age of the brain and the experimental condition. For example, R27G05-Gal4 drives early expression in LPCs and L4/L5, while the 31C06-AD, 34G07-DBD Split-Gal4 combination drives expression in older L4 neurons, and the use of one or the other to drive Bsh-KD has dramatic differences in Ap expression. The indication of the driver used in each panel will facilitate the reader's grasp of the experimental results.

      We agree, and will update the figure annotation.

      ● Bsh role in L4/L5 cell fate:

      o It is not clear whether Tll+/Bsh+ LPCs are the precursors of L4/L5. Morphologically, these cells sit very close to L5, but are much more distant from L4.

      Our current data show L4 and L5 neurons are generated by different LPCs. However, currently we don’t have tools to demonstrate which subset of LPCs generate which lamina neuron type. We are currently working on a followup manuscript on LPC heterogeneity, but those experiments have just barely been started.

      o Somatic CRISPR knockout of Bsh seems to have a weaker phenotype than the knockdown using RNAi. However, in several experiments down the line, the authors use CRISPR-KO rather than RNAi to knock down Bsh activity: it should be explained why the authors made this decision. Alternatively, a null mutant could be used to consolidate the loss of function phenotype, although this is not strictly necessary given that the RNAi is highly efficient and almost completely abolishes Bsh protein.

      The reason we chose CRISPR-KO (L4-specific Gal4, uas-Cas9, and uas-Bsh-sgRNAs) is that it effectively removed Bsh expression from majority of L4 neurons. However, it failed to knock down Bsh in L4 neurons using L4-split Gal4 and Bsh-RNAi because L4-split Gal4 expression depends on Bsh. We will include this explanation in the text.

      o Line 102: Rephrase "R27G05-Gal4 is expressed in all LPCs and turned off in lamina neurons" to "is turned off as lamina neurons mature", as it is kept on for a significant amount of time after the neurons have already been specified.

      Thanks; we will make that change.

      o Line 121: "(a) that all known lamina neuron markers become independent of Bsh regulation in neurons" is not an accurate statement, as the markers tested were not shown to be dependent on Bsh in the first place.

      Good point. We will rephrase it as “that all known lamina neuron markers are independent of Bsh regulation in neurons”.

      o Lines 129-134: Make explicit that the LPC-Gal4 was used in this experiment. This is especially important here, as these results are opposite to the Bsh Loss of Function in L4 neurons described in the previous section. This will help clarify the window of competence in which Bsh establishes L4/L5 neuronal identities through ap/pdm3 expression.

      Thanks! We will include Gal4 information in the text for every manipulation.

      ● DamID and Bsh binding profile:

      ○ Figure 5 - figure supplement 1C-E: The genotype of the Control in (C) has to be described within the panel. As it is, it can be confused with a wild type brain, when it is in fact a Bsh-KO mutant.

      Great point! Thank you for catching this and we will update it.

      ○ It Is not clear how L4-specific Differentially Expressed Genes were found. Are these genes DEG between Lamina neurons types, or are they upregulated genes with respect to all neuronal clusters? If the latter is the case, it could explain the discrepancy between scRNAseq DEGs and Bsh peaks in L4 neurons.

      We did not use “L4-specific Differentially Expressed Genes”. Instead, we used all genes that are significantly transcribed in L4 neurons (line 209-210).

      ● Dip-β regulation:

      ○ Line 234: It is not clear why CRISPR KO is used in this case, when Bsh-RNAi presents a stronger phenotype.

      As we explained it above, the reason we chose CRISPR-KO (L4-specific Gal4, uas-Cas9, and uas-Bsh-sgRNAs) is that it effectively removed Bsh expression from majority of L4 neurons. However, it failed to knock down Bsh in L4 neurons using L4-split Gal4 and Bsh-RNAi because L4-split Gal4 expression depends on Bsh. We’ll include this explanation in the text.

      ○ Figure 6N-R shows results using LPC-Gal4. It is not clear why this driver was used, as it makes a less accurate comparison with the other panels in the figure, which use L4-Split-Gal4. This discrepancy should be acknowledged and explained, or the experiment repeated with L4-Split-Gal4>Ap-RNAi.

      I think you mean 6J-M shows results using LPC-Gal4. We first tried L4-Split-Gal4>Ap-RNAi but it failed to knock down Ap because L4-Split-Gal4 expression depends on Ap. We will add this to the text.

      ○ Line 271: It is also possible that L4 activity is dispensable for motion detection and only L5 is required.

      Thanks! Work from Tuthill et al, 2013 showed that L5 is not required for any motion detection. We will include this citation in the text.

      ● Discussion: It is necessary to de-emphasize the relevance of HDTFs, or at least acknowledge that other, non-homeodomain TFs, can act as selector genes to determine neuronal identity. By restricting the discussion to HDTFs, it is not mentioned that other classes of TFs could follow the same Primary-Secondary selector activation logic.

      That is a great point, thank you! We will include this in the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This important study shows that two methods of sleep induction in the fly, optogenetically activation of the dorsal fan-shaped body (which is rapidly reversible and maintains a neuronal activity signature similar to wakefulness), and Gaboxadol-induced sleep (which shuts down neuronal activity), produce distinct forms of sleep and have different effects on brain-wide neural activity. The majority of the conclusions of the paper are supported by compelling data, but the evidence supporting the claim that the two interventions trigger distinct transcriptional responses is incomplete.

      Thank you for the helpful and detailed reviews. We feel that these have improved the manuscript considerably, and hopefully the additional figures in this Reply letter will help further convince our readers.

      Public Review

      In this study, Anthoney and coworkers continue an important, unique, and technologically innovative line of inquiry from the van Swinderen lab aimed at furthering our understanding of the different sleep stages that may exist in Drosophila. Here, they compare the physiological and transcriptional hallmarks of sleep that have been induced by two distinct means, a pharmacological block of GABA signaling and optogenetic activation of dorsal fan-shaped-body neurons. They first employ an incredibly impressive fly-on-the-ball 2-photon functional imaging setup to monitor neural activity during these interventions, and then perform bulk RNA sequencing of fly brains at different stages. These transcriptomic analyses leads them to (a) knocking out nicotinic acetyl-choline receptor subunits and (b) knocking down AkhR throughout the fly brain testing the impact of these genetic interventions on sleep behaviors in flies. Based on this work, the authors present evidence that optogenetically and pharmacologically induced sleep produces highly distinct brain-wide effects on physiology and transcription. The study is of significant interest, is easy to read, and the figures are mostly informative. However there are features of the experimental design and the interpretation of results that diminish enthusiasm.

      a- Conditions under which sleep is induced for behavioral vs neural and transcriptional studies

      1- There is a major conceptual concern regarding the relationships between the physiological and transcriptomic effects of optogenetic and pharmacological sleep promotion, and the effects that these manipulations have on sleep behavior. The authors show that these two means of sleep-induction produce remarkably distinct physiological and transcriptional responses, however, they also show that they produce highly similar effects on sleep behavior, causing an increase in sleep through increases in the duration of sleep bouts. If dFB neurons were promoting active sleep, the sleep it produces should be more fragmented than the sleep induced by the drug, because the latter is supposed to produce quiet sleep. Yet both manipulations seem to be biasing behavior toward quiet sleep.

      This is a correct observation, which is already evident in our sleep architecture data (Figure 2E-H): chronic optogenetic sleep induction promotes longer sleep bouts that are similar in structure (bout number vs bout duration) to those produced by THIP feeding. Since our plots in Figure 2E-H follow the 5min sleep criterion cutoff, upon the Reviewer’s advice we re-analyzed our optogenetic experiments for short (1-5min) sleep. These are graphed below in Author response image 1. As can be seen, and as suspected by the Reviewer, the optogenetic manipulation does not increase the total amount of short sleep; indeed, it decreases it compared to baseline (these are for the exact same data as in Figure 2). Optogenetic sleep induction does not create a bunch of short sleep bouts.

      Author response image 1.

      Short sleep in optogenetic experiments. A. Average baseline (±SEM) 1-5min sleep across a day and night. B. Average (±SEM) 1-5min sleep in optogenenetically-activated flies, across a day and night.

      We agree with the reviewer that this observation might seem inconsistent with the idea that optogenetic activation promotes active sleep, and that short sleep is active sleep. However, it does not necessarily follow that optogenetic activation has to produce short sleep. Indeed, we know from our brain imaging data (and the associated behavioral analysis) that active sleep will persist for as long as we induce it with red light. While we have not induced it for longer than 15 minutes (Tainton-Heap et al, Current Biology, 2021; Troup et al, J. of Neuroscience, 2023), this is already clearly longer than a <5min sleep bout. So our interpretation is that the longer sleep bouts induced by optogenetic activation are prolonged active sleep, rather than quiet sleep. In other words, this artificial sleep manipulation induces prolonged active sleep, rather than many short sleep bouts. This is of course different than what happens during spontaneous sleep. We have tried to be clearer about sleep bout durations in the revised manuscript (e.g., the new Figure 3), and we now admit early in the results (lines 376-380) that that we don’t know what optogenetic activation looks like in the fly brain beyond 15 minutes.

      2- The authors show that the pharmacological block of GABA signaling and the optogenetic activation of dorsal fan-shaped-body neurons cause different responses on brain activity. Based on these recordings and the behavioral and brain transcriptomic data they then claim that these responses correspond to different sleep states and are associated with the expression and repression of a different constellation of genes. Nevertheless, neural activity in animals was recorded following short stimulations whereas behavioral and transcriptomic data were obtained following chronic stimulation. In this regard, it would be interesting to determine how the 12-hour pharmacological intervention they employed for their transcriptomic analysis changes neural activity throughout the brain - 12 hours will likely be too long for the open-cuticle preps, but an in-between time-point (e.g. 1h) would probably be equally informative.

      The longest we’ve imaged brain activity for optogenetic sleep induction is 15 minutes, as discussed above. We see no changes in activity across this time, which would normally have led to a quiet sleep stage in spontaneous sleep recordings. Whole-brain imaging after 10 hours of optogenetic sleep induction (our RNA collection timepoint) is not realistic, and even 1 hour is difficult. We have however conducted overnight electrophysiological recordings (with multichannel silicon probes), where we activated the same R23E10 neurons for successive 20-minute bouts (alternating with 20min of no red light). We are preparing this work for publication (Van De Poll, et al). We see no evidence of optogenetic activation of this circuit ever producing anything resembling quiet sleep. Since we are not in a position to provide this new electrophysiological data in the current study, we are careful to clarify that we have not investigated what brain imaging looks like after chronic optogenetic activation (lines 376-380). We are showing through diverse lines of evidence that what is called sleep can look different in flies.

      b- Efficiency of THIP treatment under different conditions

      1- There are no data to quantify how THIP alters food consumption. It is evident that flies consume it otherwise they would not show increased sleep. However, they may consume different amounts of food overall than the minus THIP controls. This might have an influence on the animal's metabolism, which could at least explain the fact that metabolism-related genes are regulated (Figure 5). Therefore, in the current state, it is not possible to be certain that gene regulation events measured in this experiment are solely due to THIP effects on sleep.

      We have two arguments against this reasonable criticism. First, as discussed above, the optogenetic flies are sleeping at least as much as the THIP-fed flies, so in principle they also might be feeding less. But we see no metabolic gene downregulation in the optogenetic dataset. We include this counterargument in the discussion (lines 752-756). Then, together with our co-author Paul Shaw we have shown that THIP-fed flies are not eating less compared to controls (Dissel et al, Current Biology, 2015), by tracking dye consumption. We show those results again below in Author response image 2 to support our reasoning that feeding is not an issue.

      Author response image 2.

      Flies were fed blue dye in their food while being sleep deprived (SD), or while being induced to sleep with 0.1mg/ml THIP in their food, or both. Dye consumption was measured in triplicate for pooled groups of 16 flies. Average absorbance at 625nm (±stan dev) is shown. Experiments were not significantly different (ANOVA of means).

      2- A similar problem exists in the sleep deprivation experiments. If flies are snapped every 20 seconds, they may not have the freedom to consume appropriate amounts of food, and therefore their consumption of THIP or ATR may be smaller than in non-sleep deprived controls. Thus, it would be crucial to know whether the flies that are sleep-deprived (i.e. shaken every 20 seconds for 12 hours) actually consume comparable amounts of food (and therefore THIP) as those that are undisturbed. If not, then perhaps the transcriptional differences between the two groups are not sleep-specific, but instead reflect varying degrees of exposure to THIP.

      Please see our response to the similar critique above, and how Figure R2 addresses this concern.

      3- The authors should further discuss the slow action of THIP perfusion vs dFB activation, especially as flies only seem to fall asleep several minutes after THIP is being washed away. Is it a technical artifact? If not, it may not be unreasonable to hypothesize that THIP, at the concentration used, could prevent flies from falling asleep, and that its removal may lower the concentration to a point that allows its sleep-promoting action. The authors could easily test this by extending THIP treatment for another 4-5 minutes.

      The reviewer is partially correct in suggesting a technical artifact: THIP does not get washed away immediately after 5min of perfusion. The drip system we employ means that THIP concentration will slowly increase to the maximum concentration of 0.2mg/ml, and then slowly get diluted away at a rate of 1.25ml/minute (this is all in the Methods). In a previous study (Yap et al, Nature Communications, 2017) we used this exact same perfusion procedure to test a range of THIP concentrations, and settled on 0.2mg/ml as the lowest that reliably induced quiet sleep within 5 minutes. Higher concentrations induced quiet sleep faster, so the alternate explanation proposed by the Reviewer is not supported. We feel that our previous electrophysiological study provided the necessary groundwork for using the same approach and dosage here for our whole-brain imaging readout.

      c- Comments regarding the behavioral assays

      1- L319-322: the authors conclude that dFB stimulation and THIP consumption have similar behavioral effects on sleep. However, this is inaccurate as in Figure S1 they explain that one increases bout number in both day and night and the other one only during the day.

      We have now added a caveat about night bout architecture being different (lines 353-356). Figure S1 is now Figure 3.

      2- The behavioral definitions used for active and quiet sleep do not fit well with strong evidence that deep sleep (defined by lowered metabolic rates) is probably most closely associated with bouts of inactivity that are much longer than the >5min duration used here, i.e., probably 30min and longer (Stahl et al. 2017 Sleep 40: zsx084). Given that the authors are providing evidence that quiet sleep is correlated with changes in the expression of metabolism related genes, they should at least discuss the fact that reductions in metabolism have been shown to occur after relatively long bouts of inactivity and might reconsider their behavioral sleep analysis (i.e., their criteria for sleep state) with this in mind.

      Interestingly, induced sleep bout durations are on average longer for the optogenetic manipulation (40min vs 25min); this was evident in Figure S1C vs S1F (now Figure 3). So as discussed above, this provides a counterargument for sleep bout duration alone being indicative of metabolic processes associated with quiet sleep: the optogenetic dataset did not uncover metabolic-related pathways as relevant to that sleep manipulation. We refer to Stahl et al, Sleep, 2017, in our discussion (lines 748-750), making exactly this point about metabolic rates being decreased in longer sleep bouts, and flowing up with our observation that optogenetic flies sleep just as much, and their bouts are actually longer. So clearly different processes must be involved.

      d- Comments regarding the recordings of neuronal activity

      1- There is an additional concern regarding the proposed active and quiet sleep states that rest at the heart of this study. Here these two states in the fly are compared to the REM and NREM sleep states observed in mammals and the parallels between active fly sleep and REM and quiet fly sleep and NREM provide the framework for the study. The establishment of such parallel sleep states in the fly is highly significant and identifying the physiological and molecular correlates of distinct sleep stages in the fly is of critical importance to the field. However, the proposal that the dorsal fan shaped body (dFB) neurons promote active sleep runs counter to the prevailing model that these neurons act as a major site of sleep homeostasis. If quiet sleep were akin to NREM, wouldn't we expect the major site of sleep homeostasis in the brain to promote it? Furthermore, the authors state that the effects of dFB neuron excitation on transcription have "almost no overlap" (line 500) with the transcriptomic effects of sleep deprivation (Supplementary Table 3), which is not what would be expected if dFB neurons are tracking sleep pressure and promoting sleep, as suggested by a growing body of convergent work summarized on page four of the manuscript. Wouldn't the 10h excitation of the dFB neurons be predicted to mimic the effects of sleep deprivation if these neurons "...serve as the discharge circuit for the insect's sleep homeostat..." (line 60)? Shouldn't their prolonged excitation produce an artificial increase in sleep drive (even during sleep) that would favor deep, restorative sleep? How do the authors interpret their results with regard to the current prevailing model that dFB neurons act as a major site of sleep homeostasis? This study could be seen as evidence against it, but the authors do not discuss this in their Discussion.

      These are all excellent and thoughtful points, which have made us re-think parts of our discussion. First off, the potential comparison with REM and NREM is entirely speculative, and we have tried to make that more obvious in introduction) and the discussion (e.g, see lines 43, 708, 818). The evidence that the FB neurons (and maybe others) are involved in the homeostatic regulation of sleep is well-supported in the literature, so that part of the discussion holds. However, we concede that the timing of our sleep manipulations could benefit from more explanation. We conducted these during the flies’ subjective day, after the animals had presumably had a good night’s sleep. This means that we induced either kind of sleep for 10 daytime hours, which presumably replaced whatever behavioural states would ‘naturally’ be happening during the day. Female flies sleep less during the day than at night, and we have shown in previous work that daytime sleep quality is different than night-time sleep (van Alphen et al, Journal of Neuroscience, 2013), leading us to suggest that most ‘deep’ or quiet sleep happens at night, for flies. Following this reasoning, daytime optogenetic activation might not be depriving flies of much quiet sleep, or accumulating a deep sleep drive as the Reviewer proposes. Rather, both induced sleep manipulations could be providing 10 hours of either kind of sleep that the flies don’t really ‘need’. Why did we design it this way? Firstly, we were interested in simply asking what these chronic sleep manipulations do to gene expression in rested flies, and how they might be similar or different. We focussed on daytime manipulations to avoid precisely the confound of sleep pressure, and also because we observed red-light artifacts at night for our optogenetic experiments (which we reported). Our sleep deprivation strategy was designed specifically as a control for the THIP (Gaboxadol) experiments, to control for non-sleep related effects of the drug (see below our rationale for why this was less crucial for the optogenetic experiments). In conclusion, we had a logical rationale for how the experiments were done, centred on the straightforward question of whether these two different approaches to sleep induction were having similar effects in well-rested flies. In retrospect, we were not anticipating the Reviewer’s thoughtful logic regarding the dFB’s potential role in also regulating deep sleep homeostasis. We now provide some discussion along these lines to make readers aware of this line of reasoning, as well as our rationale for why prolonged optogenetic sleep induction was not sleep-depriving (lines 768-777).

      2- Regarding the physiological effects of Gaboxadol, to what extent is the quieting induced by this drug reminiscent of physiology of the brains of flies spontaneously meeting the behavioral criterion for quiet sleep? Given the relatively high dose of the drug being delivered to the de-sheathed brain in the imaging experiments (at least when compared to the dose used in the fly food), one worries that the authors may be inducing a highly abnormal brain state that might bear very little resemblance to the deeply sleeping brain under normal conditions. As the authors acknowledge, it is difficult to compare these two situations. Comparing the physiological state of brains put to sleep by Gaboxadol and brains that have spontaneously entered a deep sleep state therefore seems critical.

      As discussed above, our Gaboxadol (THIP) perfusion concentration (0.2mg/ml) was the minimal dosage that effectively induced sleep within 5 minutes, based upon previously published work (Yap et al, Nature Communications, 2017). Lower concentrations were unreliable, with some never inducing sleep at all. Comparisons with feeding THIP are tenuous, and we make that clear in our discussion (lines 731-735). Nevertheless, the Reviewer makes an excellent point about comparisons with spontaneous ‘quiet’ sleep. Here, we feel well supported (please see Author response image 3 below, comparing THIP-induced sleep (this work, B) and spontaneous sleep (A) from previous study). In our previous study (Tainton-Heap et al, 2021) we showed that neural activity and connectivity decreases during spontaneous quiet sleep. This is what we also see with THIP perfusion. In contrast, in Troup et al, J. of Neuroscience (2023) we confirm that neither neural activity nor connectivity changes during optogenetic R23E10 activation, and general anesthesia – unlike THIP – does NOT produce a quiet brain state. Our finding that THIP effects are nothing like general anesthesia (at the level of brain activity levels) suggests a physiological sleep state closer to spontaneous quiet sleep. We elaborate on this important observation in our results, also pointing to crucial differences with general anesthesia (lines 411-415).

      Author response image 3.

      THIP-induced sleep resembles quiet spontaneous sleep. A. Calcium imaging data from spontaneously sleeping flies, taken from Tainton-Heap et al, 2021. Left, percent neurons active; right, mean degree, a measure connectivity among active neurons. Both measures decrease during later stages of sleep. B. Calcium imaging data from flies induced to sleep with 5min of 0.2mg/ml THIP perfusion (this study). Left, percent neurons active; right, mean degree. Both measures are significantly decreased, resembling the later stages of spontaneous sleep, which we have termed ‘quiet sleep. Hence THIP-induced sleep resembles quiet sleep. Note that the genetic background is different in A and B, hence the different baseline activity levels.

      3- There are some issues with Figure 3, in particular 3C-D. It is not clear whether these panels show representative traces or an average, however both the baseline activity and fluorescence are different between C and D, in particular in their amplitude. Therefore, it is difficult to attribute the differences between C and D to the stimulation itself or to the previously different baseline. In addition, the fact that flies with dFB activation seem to keep a basal level of locomotor activity whereas THIP-treated ones don't is quite striking, however it is not being discussed. Finally, the authors claim that the flies eventually wake up from THIP-induced sleep (L360-361), however there are no data to support this statement.

      These are representative traces, which is a way of showing the raw calcium data (Cell ID) so readers can see for themselves that one manipulation silences whereas the other does not – even though flies become inactive for both. The Y-axis scale is standard deviation of the experiment mean. Since THIP decreases neural activity, then the baseline is comparatively higher. Since optogenetic activation does not change average neural activity levels, the baseline is centered on zero. This is an outcome of our analysis method and does not reflect any ‘true’ baseline. We have now clarified this in our figure legend. We now also confess that flies rendered asleep optogenetically can be ‘twitchy’ (line 374). Finally, we show data for 3 flies that were recorded until they woke up. The rest were verified behaviorally, after the experiment. This is now explained in the Methods.

      4- In Figure 4C, it is strange that the SEM is always exactly the same across the whole experiment. Readers should be aware that there might have been an issue when plotting the figure.

      This is not a mistake, the standard errors are just all quite close (between 0.17 and 0.22). This is because of the way we did the analysis, asking how many flies responded to each stimulus event, with incremental levels of responsiveness. This is explained in the Methods. The figure makes the important point of sleep and recovery.

      e- Comments regarding the transcript analyses

      1- General comment: the title of this manuscript is inaccurate - the "transcriptome" commonly refers to the entirety of all transcripts in a cell/tissue/organ/animal (including genes that are not differentially expressed following their interventions), and it is therefore impossible to "engage two non-overlapping transcriptomes" in the same tissue. Perhaps the word "transcriptional programs" or transcriptional profiles" would be more accurate here?

      We thank the Reviewer for this advice and have changed the title as proposed.

      2- Given the sensitivity of transcriptomic methods, there is a significant concern that the optogenetic experiments are not as well controlled as they could be. Given the need for supplemental all-trans retinal (ATR) for functional light gating of channelrhodopsins in the fly, it is convenient to use flies with Gal4-driven opsin that have not been given supplemental ATR as a negative control, particularly as a control for the effects of light. However, there is another critical control to do here. Flies bearing the UAS-opsin responder element but lacking the GAL4 driver and that have been fed ATR are critical for confirming that the observed effects of optogenetic stimulation are indeed caused by the specific excitation of the targeted neurons and not due to leaky opsin expression, or the effect of ATR feeding under light stimulation or some combination of these factors. Given the sensitivity of transcriptomic methods, it would be good to see that the candidate transcripts identified by comparing ATR+ and ATR- R23E10GAL4/UAS-Chrimson flies are also apparent when comparing R23E10GAL4/UAS-Chrimson (ATR+) with UAS-Chrimson (ATR+) alone.

      We have not done these experiments on UAS-Chrimson/+ controls. Like many others in our field, we viewed non-ATR flies as the best controls, because this involves identical genotypes. Since we were however aware that ATR feeding itself could be affect gene expression, we specifically checked for this with our early (1hour) collection timepoint. We only found 26 gene expression differences between ATR and -ATR flies at this early timepoint, compared with 277 for the 10-hour timepoint. We detail this rationale in our results, explaining why this is a convincing control for ATR feeding. If there was leaky opsin expression / activity, this would have been evident in our design. Regarding the cumulative effect of light, this would also have been accounted in our design, as only 1 hour would have elapsed in our first timepoint compared to 10 hours in our second. While the Reviewer is correct in saying that parental controls are called for in many Drosophila experiments, this becomes quickly unmanageable in transcriptomic studies, which is exactly why well-designed +ATR vs -ATR comparisons in the exact same strain are most appropriate. We feel that our 1-hr timepoint mostly addresses this concern.

      3- Figures about qPCR experiments (5G and 6G) are problematic. First, whereas the authors seem satisfied with the 'good correspondence' between their RNA-seq and qPCR results, this is true for only ~9/19 genes in 5G and 2/6 genes in 6G. Whereas discrepancies are not rare between RNA-seq and qPCR, the text in L460-461 and 540-541 is misleading. In addition, it is unclear whether the n=19 in L458 refers to the number of genes tested or the number of replicates. If the qPCR includes replicates, this should be more clearly mentioned, and error bars should be added to the corresponding figures.

      We consider that our qPCR validations were convincing, as they were all mostly changed in the ‘right’ direction. We agree that are some discrepancies, so have modified our language to reflect this. We have also clarified that 19 refers to the number of genes validated by qPCR in that THIP dataset. All qPCRs involved three technical replicates. We prefer to keep these histograms the way they are to convey these simple trends. For complete transparency, we now provide a supplemental Excel worksheet with all of the qPCR data, alongside corresponding RNAseq data and stats for the selected genes (Supplementary Table 9).

      4- There is a lack of error bars for all their RNAseq and qPCR comparisons, which is particularly surprising because the authors went to great lengths and analyzed an applaudably large amount of independent biological replicates, yet the variability observed in the corresponding molecular data is not reported.

      The genes reported in each of our datasets and associated supplemental figures and tables were all significant, as determined by criteria outlined in the Methods. However, we appreciate that readers might want to get a sense of the values and variances involved, as well as access to the entire gene datasets. We now provide all of these as additional ‘sheets’ in our existing supplemental tables (S2-S7), so this should be very easy to navigate and evaluate. In addition to the previously provided lists for significant genes, in the second Excel sheet (‘All genes’) readers will be able to see the data for all 5 replicates, for the significant genes as well as all other ~15,000 genes (listed in alphabetical order). We feel that this will be a helpful resource, because admittedly significance thresholds can still be a little arbitrary and some readers might want to look up ‘their’ genes of interest.

      Comments to authors

      Other comments

      1- Text in L441 & 606 is misleading. According to ref 52, AkhR is involved specifically in starvation-induced sleep loss, and not in general sleep regulation.

      Corrected.

      2- The language used in L568-570 and 573-574 is confusing. The authors should specify that the knock down of cholinergic subunits, rather than the subunits themselves is what causes sleep to increase or decrease.

      Corrected.

      3- The authors' investigation of cholinergic receptor subunits function is very preliminary, and it is difficult to draw any conclusion from what is presented here. In particular, their behavioral data is difficult to reconcile with the RNA-seq data showing overexpression of both short sleep increasing and short sleep decreasing subunits. Without knowing where in the brain these subunits are required for controlling sleep, the data in Figure 7 is difficult to appreciate.

      We have now conducted additional experiments where we specifically knocked down these alpha receptor subunits (all 7 of them) in the R23E10 neurons. This seemed an obvious knockdown location, to determine if any of these subunits regulated activity in the same sleep promoting neurons that were the focus of this study. We found that alpha1 knockdown in these neurons had similar sleep phenotypes, which we believe is an important result. Since this functional localisation is a logical ending for the paper, we have now made it the final figure.

      Suggestions & comments

      1- It would be interesting if the authors could discuss their findings that metabolism genes are downregulated in THIP flies in the context of recent work that showed upregulation of mitochondrial ROS after sleep deprivation (Kempf et al, 2019).

      We now add the Kempf 2019 reference and allude to how those findings could be consistent with ours.

      2- The fact that THIP-induced sleep persists long after THIP removal (Fig 3D) is very intriguing and interesting. This suggests that the drug might trigger a sleep-inducing pathway that can continue on its own without the drug, once activated.

      This is correct, and in stark contrast to the optogenetic manipulation we employ, which does not appear to show such sleep inertia. We have now added a sentence highlighting this interesting difference (lines 394-396).

      3- The authors identify many new genes regulated in response to specific methods for sleep induction. These are all potentially interesting candidates for further studies investigating the molecular basis of sleep. It would be interesting to know which of these genes are already known to display circadian expression patterns.

      By providing all of the gene lists, these are now available to ask questions such as these. We hesitate however to delve into this domain for this work, as our main goal was to compare these two kinds of sleep in flies.

      4- The brain-wide monitoring of neural activity invites a number of very exciting follow-up experiments - most importantly, it would be fascinating to establish, which neurons are active in the different phases the authors describe! Are these neurons that are involved in transmitting external visual stimuli to the central brain? Do they also project into the central complex? They could make use of the large collection of existing driver lines in the fly and they could also exploit the extraordinary knowledge of the connectome and transcriptome of the fly brain.

      Thank you for sharing our enthusiasm for these likely future directions.

      5- The Dalpha2,3,4,6 and 7 Knock-out strains they generate will be a useful reagent for the Drosophila neuroscience community once the efficiency/success of the knock-out has been confirmed by qPCR.

      These knockout strains have all been confirmed by our co-authors Hang Luong, Trent Perry, and Philip Batterham. These knockout confirmations are outlined in publications that we reference (Perry et al, 2021).

      Materials and methods:

      1- This study has employed custom-built apparatus and custom-written code/scripts, but these do not appear to be available to the reader. For the sake of replicability, the authors should make these available.

      The code/scripts are available via the University of Queensland research data management system as described in the Methods, and can be sent by the Lead Contact. The imaging hardware and analysis code are identical to what was described in a previous publication, and available as directed therein (Tainton-Heap et al, 2021).

      2- Also, the authors should give details on the food used to rear their flies. Fly media comes in several common forms and sleep is sensitive to diet.

      This has now been elaborated in the beginning of the Methods.

      3- The light regime used for optogenetic excitation of dFB neurons consists of 12h of uninterrupted bright red LED light. Most optogenetic stimulations consist of pulsed high frequency flashes interlaced with pauses in illumination. Can dFB neurons be driven constitutively with 12 hours of bright light?

      We showed in Tainton-Heap (2021) that 7Hz pulsed red light had exactly the same effect on R23E10/Chrimson readouts as continuous red light, which is why we opted here to provide continuous red light. That optogenetic sleep induction can be driven continuously for 12 hours is evident by our 24-hour sleep profiles. However, we agree that one could question whether sleep quality is similar after 12 hours. To address this, we did an additional experiment where we stimulated the flies hourly, to determine if their behavioural responsiveness to mechanical stimuli changed over the course of continued sleep induction, for both optogenetic and THIP-induced sleep. We present the data below in Author response image 4. As can be seen in these new analyses, while optogenetic sleep induction persists across 12 daytime hours (speed is close to zero throughout), flies do indeed become more responsive later in the day. This could have two different interpretations: either some sleep functions are being satisfied over time, or the activation regime is becoming less effective over time. Either way, these data show that at our 10-hour daytime timepoint, unstimulated flies are still largely inactive, even though their arousal thresholds might have gradually changed; so the uninterrupted red-light regime is still effective. The comparison with THIP is interesting: here there does not seem to be a change in responsiveness over time; the drug just decreases behavioral responsiveness throughout. Together, these experiments support our view that both approaches are sleep-promoting throughout the 12-hour day, although we appreciate that sleep quality is not identical.

      Author response image 4.

      A) The average speed of baseline (grey) and optogenetically-activated flies (green) across 24 hours. Red dots indicate vibration stimulus times. B) The average speed of control (grey) and THIP-fed flies (blue) across 24 hours. Flies are all R23E10/Chrimson. N= 87 for optogenetic, n=88 for -THIP, n=85 for +THIP.

      4- The authors use the SNAP apparatus to prevent THIP-treated flies from sleeping to tease out possible sleep-independent effects. This is an excellent control. Why have the authors not done the same with the optogenetic treatment? It's surprising not to see this control given the concern the authors express (lines 501 - 502) that the dFB manipulation might be paralyzing awake flies, which certainly seems possible given the light regimes used. Why not test this directly with SNAP?

      We appreciate that this may have been a valuable additional control. However, we designed this control for the THIP experiments specifically because of concerns about THIP’s (yet unknown) mechanism of action in flies. THIP is a gabaergic drug with most likely many off-target effects that have little to do with sleep, hence the need for a control where we compare to flies that ingested THIP but have been prevented from sleeping. In contrast, R23E10-driven sleep induction is exactly that, a circuit when activated that induces sleep. Whatever specific neurons might really be involved, the Gal4 circuit is sleep-inducing. This is well supported by multiple publications. The most appropriate control for assessing transcriptomic effects during optogenetic sleep here is not preventing sleep, but rather no increased sleep in flies that have not ingested ATR, and comparing that to effects of ATR alone, which is what we have done. Adding a sleep-deprivation layer onto both of these analyses may have been interesting, but a lot more analyses and not strictly required to identify relevant sleep-related genes. We have rephrased the misleading sentence about paralyzing flies, to instead clarify that lack of overlap with the SD dataset suggests that optogenetic activation is not preventing sleep functions from being engaged.

      5- A pairwise comparison of ZT01 and ZT10 does not address circadian expression cycles in a meaningful way. There will be strong effects of the LD cycle here. I suggest toning this down. (Though it is gratifying to see the expected changes in the core clock genes.)

      We have changed the language from ‘circadian’ to ‘light-dark’ to address this, although have kept the word ‘circadian’ when referring specifically to genes such as per, clock, timeless, etc.

      6- Line 109: There is a reference missing.

      We now provide the relevant reference.

      Results

      1- General comment regarding the figures: a general effort could be made to improve the design and quality of the figures and make them more readable. There are a lot of issues such as stretched or misaligned text, badly drawn frames, etc.

      We think we know which figures this might relate to (e.g., Figures 3,4B), so we have adjusted where appropriate.

      2- Instead of 'dFB-induced' (e.g., L77) it would be more accurate to use 'optogenetically-induced'

      Thank you for this helpful advice. We have changed our language throughout to say ‘optognetically-induced’

      3- Figure S1 should be integrated in the main figure to make the quantification more easily 4accessible.

      We have integrated Figure S1 into the main figures. It is now Figure 3.

      5- It would be good to include red light controls in Figure 2C, E, G.

      Making Figure S1 a main figure has better highlighted the fact that we have done red light controls (‘baseline’).

      6- line 313: Fig2E-H - these graphs would benefit if the authors made it more obvious where the maximum sleep amount would fall - i.e. the combination of bouts and minutes that add up to 12 hours (and therefore the entire day/night)

      If a fly were to sleep uninterrupted for all 12 hours of a day or night, that would amount to a sleep bout 720 minutes long. We do not feel that identifying this maximum on these graphs would be helpful. It should be clear from the data that a floor is reached with very few sleep bouts exceeding 60 minutes in our paradigm. To help orient the reader though, we now clarify in the figure legend that the maximum is 720 minutes or 12 hours.

      7- Fig. 2B, D: It was not clear why the authors took the 3-day average here. Doesn't that lead to a whole range of very different behaviors? I could, perhaps naively, imagine that a fly's behavior changes after 2 days of almost-permanent sleep?

      We took the 3-day average because the effect of THIP on each successive day was not significantly different (see Author response image 5, below). Flies wake up enough to have a good feed (see Author response image 2) and then go back to sleep. Since this is however an important point raised by the reviewer, we now mention in the Methods that sleep duration was not different among the 3 averaged days and nights (lines 193-195).

      Author response image 5.

      Data from THIP feeding experiment (Figure 2B) in manuscript, separated into 3 successive days and nights, with THIP-fed flies (blue) compared to controls (white). Averages  SD are shown, samples sizes are the same as in Figure 2D. No THIP data was significantly different across days and nights (ANOVA of means).

      8- In Figure 2C the authors compare optogenetically induced to "spontaneous sleep," which I think refers to baseline sleep before stimulation, according to the figure. I think the proper comparison would be to the red light control (ATR-); though see the comment above regarding optogenetic controls).

      This information was provided in Figure S1. We now provide it as a main Figure 3, as requested above.

      We also made a point about red light having an effect at night, which is why we focussed on daytime effects for our transcriptomic comparisons. We feel that the ATR-fed flies (minus red light) are an appropriate control here for optogenetically-induced sleep: same exact genotype and ATR feeding, just no optogenetic activation. We therefor would prefer to keep these graphs as they are, especially since we show -ATR data subsequently.

      9- Figures 3A and 4A are redundant; Figure 3B has some active ROIs that are outside of the brain. I am not sure how this is possible?

      We have removed the redundant 4A and replaced it with the THIP molecule to clearly signal what this figure is focussed on. In Figure 3B (now 4B), the brain mask is a visual estimate made from the middle of the image stack. Some neurons in other layers are outside this single-layer estimate. All neurons were all accounted for.

      10- Figure 4B is confusing. It took me a while to understand and so it can do with re-drawing in a more accessible way.

      We agree that this was confusing, e.g. there were too many arrows. We have redrawn and simplified (Now 5A).

      11- The authors state that flies wake up from THIP-induced sleep on the ball, but in Figure 4D there appears to be fewer samples for flies who have woken up from THIP (3) compared to those observed before THIP administration. Are flies dying?

      None of the flies died. Most flies were removed from imaging to confirm recovery, while 3 were left in our imaging setup to measure brain activity upon recovery. These results are in Figure 5C and now clarified in the Methods.

      12- Fig5C,D: I'm surprised that by far the most significant changes (in terms of log2-FC and p-val) occur in the sleep-deprived flies? It is not clear to me what the authors mean by effects that "relate waking process"? Perhaps they could elaborate on this?

      We have removed the phrase ‘relates to waking processes’. We now also remark on the high level of fold-change in many of these genes but refrain from discussing this further in the results. It is interesting though.

      13- The sentence in L425-428 is unclear - it would be good to rephrase this.

      We have rephrased this sentence, hopefully it’s clearer now.

      14- Text in L544-545 is confusing. What do you mean by 'less clear'?

      We have replaced ‘less clear’ with ‘not dominated by a single category’.

      15- It is unclear what is the control in Fig 7A. It would be good to mention what strain was used.

      Different knockout strains had different controls. These are identified in the figure legend and Methods.

      16- L579-581: it would be helpful to include this data in a supplementary figure.

      We now provide this as a supplementary figure as requested (Supplementary Figure 6).

      17- There is no information about R57C10 in the methods - it would be good to explain which neurons this line labels, and why you chose it.

      We now clarify in the methods that R57C10-Gal4 is a pan-neural driver, and provide a reference.

      18- Table S5 - If I'm not mistaken then the first line should say 1h, not 10h.

      Corrected

    1. Author Response

      We are grateful for the constructive comments of the reviewers and for the succinct assessment of our work by the editors. Here we provide a brief summary of our response to answer the major criticism of our reviewers. We will give a detailed point-to-point response soon when we upload a revision of our paper.

      1) The MATLAB code for the spatial autocorrelation analysis is now freely available at the following site: : https://github.com/dcsabaCD225/Moran_Matlab/blob/main/moran_local.m If any question arises during its implementation, please contact Csaba Dávid (david.csaba@koki.hu)

      2) Concerning the computer resources and times required to perform Moran’s I image analysis, here we provide a brief description of the hardware and the calculations for images with different sizes.

      Hardware used for performing the analysis:

      Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz, 2594 Mhz, 4 kernel CPU, 64GB RAM, NVIDIA GeForce GTX 1080 graphic card.

      MATLAB R2021b software was used for implementation.

      Computation times are shown in Author response table 1.

      Author response table 1.

      3) In response to the comment:

      “While the method's avoidance of AI training appeals to those lacking computational know-how and shows improved accuracy over basic threshold-based techniques, there are valid concerns regarding its performance in comparison to advanced methodologies”.

      Comparison of Moran’s I image analysis with AI based segmentations raises conceptual problems which will be addressed in detail in the revised version. Briefly, the basis of AI based analyses is that the ground truth is known and using a large teaching set AI learns to extract the relevant information for image segmentation. In several cases, however (like protein distribution in the membrane) the ground truth is not known and cannot be easily determined by any single observer. Defining spatial inhomogeneities in protein distribution, differentiating proteins involved vs not involved in clusters is highly subjective. Indeed, our analysis showed the 23 expert human observers varied hugely in establishing the boundaries of a protein cluster. As a consequence, establishing and using a teaching set would be highly contentious in these cases. In an average laboratory setting generating a teaching set using hundreds of images examined by two dozen people would not be impossible but not really plausible. The beauty of Moran’n I analysis is that it is able to extract the relevant signals from an image generated in different, often noisy condition using a simple algorithm that allows quantitative characterization and identification of changes in many biological and non-biological samples.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors describe the synthesis and testing of the anti-cancer activity of a new molecule CK21 against pancreatic cancer mouse models. This part of the study is very strong showing regression of pancreatic tumors at non-toxic concentrations, which is very hard to achieve for practically uncurable pancreatic cancer. Authors synthesized CK21 as an analog of a known inhibitor of RNA synthesis which is very toxic. The authors did very little attempt to understand whether the mechanism of anti-cancer efficacy of CK2 is similar to this known inhibitor of transcription or not. One cannot compare gene expression profiles between untreated and CK21-treated cells, taking into account that CK2 may inhibit the expression of all genes. The effect of CK2 on general transcription needs to be tested first, and then based on this data absolute changes in the expression of genes may be considered for the revealing of the mechanism of activity of CK21.

      We also appreciated the toxicity concerns; thus, we designed the transcriptomic analysis on the human organoid cultured cells for early time points of 3, 6, 9 and 12 h, and with a CK21 concentration of 50nM, to ensure that at the time of harvest, the cells were ~100% viable. At these time points, many genes were upregulated but defined by IPA as enriched for cell death (apoptosis and necrosis), senescence and cell cycle arrest (Fig 5). This led us to hypothesize that the direct effect of CK21 on the tumor cells is the induction of apoptosis, but via multiple pathways.

      Reviewer #3 (Public Review):

      This manuscript describes CK21, a modified version of Triptolide, a natural compound with antcancer activities, to improve its bioavailability. The authors tested the compound in two human pancreatic cancer cell lines, in vitro and in vivo. The authors also use two human organoid lines derived from pancreatic cancer, and mouse KC and KPC cell lines. In all models, CK21 treatment induces dose-dependent cytotoxicity. In vivo, CK21 causes tumor regression. The authors perform gene expression analysis and show that treated organoids have generally lower transcription, consistent with cytotoxicity, and a reduction in the KFkB pathway activation.

      Key experiments that would strengthen the current manuscript are: the inclusion of normal cell lines and organoids, too, presumably, show no cytotoxic effect. If that is the case, the authors would have the opportunity to compare responses and determine whether a tumor-specific mechanism can be defined.

      Our in vivo studies suggest that CK21 is more specific to tumors, as CK21 ≤3 mg/kg treated mice were 100% viable and gained weight comparably to no treatment group (Fig.2d). Furthermore, in vitro studies with primary fibroblast cells indicate that comparable significant toxicity to CK21 after 72h culture was observed at 500 nM (Fig.s2). In contrast, CK21 induced significant toxicity in AsPC1 and Panc-1 cells at 50 nM (Fig. 1f.)

      The authors observe that few gene changes - besides from overall lowering in transcription, occur upon treatment with CK21. They suggest that the drug acts through inhibition of the NFkB pathway and an increase in reactive oxygen species (ROS). However, no experiments to test whether either/both of these findings explain the cytotoxic effect (rescue experiments would be particularly valuable).

      We performed a rescue study using an ROS inhibitor (acetylcysteine) but observed no significant effect (data not shown). We speculate that ROS and/or NF-B might function synergistically; additionally, it is possible that other mechanisms might be involved in the anti-tumor effects of CK21.

      In the last figure, the authors text whether CK21 is immunosuppressive by testing immunity against a mis-matched tumor cell line (using KPC tumors, mixed strain, in mixed strain mice). The immunity against HLA mis-matched cells is a very strong immune reaction, and mild immune suppression might be missed, which diminishes the value of these findings.

      KPC-960 tumor cells were derived from KPC (C57BL/6 background); therefore, KPC-960 tumors were HLA matched with host C57BL/6 mice. We were surprised to observe spontaneous rejection of the KPC-960 tumor line, since this contrasts with Torres et al. 2013. We speculate that this could be due to the increased number of passages resulting in antigenic drift, which may result in the accumulation of mutations that induce spontaneous rejection.

      We agree that there might be mild immunosuppression that we did not detect; we have included this caveat in the discussion. KC-6141 tumor cells used as CTL targets were from KC mice (mixed background – B6.129).

    1. Author Response

      Reviewer #1:

      This is a very timely paper that addresses an important and difficult-to-address question in the decision-making field - the degree to which information leakage can be strategically adapted to optimise decisions in a task-dependent fashion. The authors apply a sophisticated suite of analyses that are appropriate and yield a range of very interesting observations. The paper centres on analyses of one possible model that hinges on certain assumptions about the nature of the decision process for this task which raises questions about whether leak adjustments are the only possible explanation for the current data. I think the conclusions would be greatly strengthened if they were supported by the application and/or simulation of alternative model structures.

      We thank the reviewer for this positive appraisal of our study. We now entirely agree with their central comment about whether leak adjustments are the only (or even the best) explanation for the current data. We hope that the additional modelling sections that we have discussed in response to main comment 1 above have strengthened the paper. We have responded point-by-point to their public review, as this contained their main recommendations for revision.

      The behavioural trends when comparing blocks with frequent versus rare response periods seem difficult to tally with a change in the leak. […] Are there other models that could reproduce such effects? For example, could a model in which the drift rate varies between Rare and Frequent trials do a similar or better job of explaining the data?

      We can see why the reviewer has advocated for a possible change of drift rate (or ‘gain’ applied to sensory evidence) between conditions to explain our behavioural findings. We found, however, that changes in drift rate could elicit qualitatively similar changes in integration kernels to changes in decision threshold:

      Author response image 1.

      Changes in gain applied to incoming sensory evidence (A parameter in model) have similar effects on recovered integration kernels from Ornstein-Uhlenbeck simulation as changes in decision threshold.

      The likely reason for this is that the overall probability of emitting a response at any point in the continuous decision process is determined by the ratio of accumulated evidence to decision threshold. A similar logic applies to effects on reactions times and detection probability (main figure 2): increasing sensory gain/decreasing decision threshold will lead to faster reaction times and increased detection probability during response periods.

      Both parameters may even have a similar effect on ‘false alarms’, because (as the reviewer notes below) false alarms in our paradigm are primarily being driven by the occurrence of stimulus changes as well as internal noise. In fact, the false alarm findings mean it is difficult to fully reconcile all of our behavioural findings in terms of changes in a single set of model parameters in the O-U process. It is possible that other changes not considered within our model (such as expectations of hazard rates of inter-response intervals leading to dynamic thresholds etc.) may have had a strong impact upon the resulting false alarm rates. A full exploration of different variations in O-U model (with varying urgency signals, hazard rates, etc.) is beyond the scope of this paper.

      For this reason, we have decided in our new modelling section to focus primarily on a single, well-established model (the O-U process) and explore how changes in leak and threshold affect task performance and the resulting integration kernels. We note that this is in line with the suggestion of reviewer #2, who focussed on similar behavioural findings to reviewer #1 but suggested that we look at decision threshold rather than drift rate as our primary focus.

      This ties in to a related query about the nature of the task employed by the authors. Due to the very significant volatility of the stimulus, it seems likely that the participants are not solely making judgments about the presence/absence of coherent motion but also making judgments about its duration (because strong coherent motion frequently occurs in the inter-target intervals). If that is so, then could the Rare condition equate to less evidence because there is an increased probability that an extended period of coherent motion could be an outlier generated from the noise distribution? Note that a drift rate reduction would also be expected to result in fewer hits and slower reaction times, as observed.

      As mentioned above, the rare and frequent targets are indeed matched in terms of the ease with which they can be distinguished from the intervening noise intervals. To confirm this, we directly calculated the variance (across frames) of the motion coherence presented during baseline periods and response periods (until response) in all four conditions:

      Author response image 2.

      The average empirical standard deviation of the stimulus stream presented during each baseline period (‘baseline’) and response period (‘trial’), separated by each of the four conditions (F = frequent response periods, R = rare, L = long response periods, S = short). Data were averaged across all response/baseline periods within the stimuli presented to each participant (each dot = 1 participant). Note that the standard deviation shown here is the standard deviation of motion coherence across frames of sensory evidence. This is smaller than the standard deviation of the generative distribution of ‘step’-changes in the motion coherence (std = 0.5 for baseline and 0.3 for response periods), because motion coherence remains constant for a period after each ‘step’ occurs.

      Some adjustment of the language used when discussing FAs seems merited. If I have understood correctly, the sensory samples encountered by the participants during the inter-response intervals can at times favour a particular alternative just as strongly (or more strongly) than that encountered during the response interval itself. In that sense, the responses are not necessarily real false alarms because the physical evidence itself does not distinguish the target from the non-target. I don't think this invalidates the authors' approach but I think it should be acknowledged and considered in light of the comment above regarding the nature of the decision process employed on this task.

      This is a good point. We hope that the reviewer will allow us to keep the term ‘false alarms’ in the paper, as it does conveniently distinguish responses during baseline periods from those during response periods, but we have sought to clarify the point that the reviewer makes when we first introduce the term.

      “Indeed, participants would occasionally make ‘false alarms’ during baseline periods in which the structure of the preceding noise stream mistakenly convinced them they were in a response period (see Figure 4, below). Indeed, this means that a ‘false alarm’ in our paradigm has a slightly different meaning than in most psychophysics experiments; rather than it referring to participants responding when a stimulus was not present, we use the term to refer to participants responding when there was no shift in the mean signal from baseline.”

      And:

      “The fact that evidence integration kernels naturally arise from false alarms, in the same manner as from correct responses, demonstrates that false alarms were not due to motor noise or other spurious causes. Instead, false alarms were driven by participants treating noise fluctuations during baseline periods as sensory evidence to be integrated across time, and the physical evidence preceding ‘false alarms’ need not even distinguish targets from non-targets.”

      The authors report that preparatory motor activity over central electrodes reached a larger decision threshold for RARE vs. FREQUENT response periods. It is not clear what identifies this signal as reflecting motor preparation. Did the authors consider using other effectorselective EEG signatures of motor preparation such as beta-band activity which has been used elsewhere to make inferences about decision bounds? Assuming that this central ERP signal does reflect the decision bounds, the observation that it has a larger amplitude at the response on Rare trials appears to directly contradict the kernel analyses which suggest no difference in the cumulative evidence required to trigger commitment.

      Thanks for this comment. First, we should simply comment that this finding emerged from an agnostic time-domain analysis of the data time-locked to button presses, in which we simply observed that the negative-going potential was greater (more negative) in RARE vs. FREQUENT trials. So it is simply the fact that it precedes each button press that we relate it to motor preparation; nonetheless, we note that (Kelly and O’Connell, 2013) found similar negative-going potentials at central sensors without applying CSD transform (as in this study). Like them, we would relate this potential to either the well-established Bereitschaftpotential or the contingent negative potential (CNV).

      We agree that many other studies have focussed on beta-band activity as another measure of motor preparation, and to make inferences about decision bounds. To investigate this, we used a Morlet wavelet transform to examine the time-varying power estimate at a central frequency of 20Hz (wavelet factor 7). We repeated the convolutional GLM analysis on this time-varying power estimate.

      We first examined average beta desynchonisation at a central cluster of electrodes (CPz, CP1, CP2, C1, Cz, C2) in the run-up to correct button presses during response periods. We found a reliable beta desynchonisation occurred, and, just as in the time-domain signal, this reached a greater threshold in the RARE trials than in the FREQUENT trials:

      Author response image 3.

      Beta desynchronisation prior to a correct response is greater over central electrodes in the RARE condition than in the FREQUENT condition.

      We agree with the reviewer that this is likely indicative of a change in decision threshold between rare and frequent trials. We also note that our new computational modelling of the O-U process suggests that this in fact reconciles well with the behavioural findings (changes in integration kernels). We now mention this at the relevant point in the results section:

      “As large changes in mean evidence are less frequent in the RARE condition, the increased neural response to |Devidence| may reflect the increased statistical surprise associated with the same magnitude of change in evidence in this condition. In addition, when making a correct response, preparatory motor activity over central electrodes reached a larger decision threshold for RARE vs. FREQUENT response periods (Figure 7b; p=0.041, cluster-based permutation test). We found similar effects in beta-band desynchronisation prior, averaged over the same electrodes; beta desynchronisation was greater in RARE than FREQUENT response periods. As discussed in the computational modelling section above, this is consistent with the changes in integration kernels between these conditions as it may reflect a change in decision threshold (figure 2d, 3c/d). It is also consistent with the lower detection rates and slower reaction times when response periods are RARE (figure 2 b/c).”

      We did also investigate the lateralised response (left minus right beta-desynchronisation, contrasted on left minus right responses). We found, however, that we were simply unable to detect a reliable lateralised signal in either condition using these lateralised responses. We suspect that this is because we have far fewer response periods than conventional trialbased EEG experiments of decision making, and so we did not have sufficient SNR to reliably detect this signal. This is consistent with standard findings in the literature, which report that the magnitude of the lateralised signal is far smaller than the magnitude of the overall beta desynchronisation (e.g. (Doyle et al., 2005))

      P11, the "absolute sensory evidence" regressor elicited a triphasic potential over centroparietal electrodes. The first two phases of this component look to have an occipital focus. The third phase has a more centroparietal focus but appears markedly more posterior than the change in evidence component. This raises the question of whether it is safe to assume that they reflect the same process.

      We agree. We have now referred to this as a ‘triphasic component over occipito-parietal cortex’ rather than centroparietal electrodes.

      Reviewer #2:

      Overall, the authors use a clever experimental design and approach to tackle an important set of questions in the field of decision-making. The manuscript is easy to follow with clear writing. The analyses are well thought-out and generally appropriate for the questions at hand. From these analyses, the authors have a number of intriguing results. So, there is considerable potential and merit in this work. That said, I have a number of important questions and concerns that largely revolve around putting all the pieces together. I describe these below.

      Thanks to the reviewer for their positive appraisal of the manuscript; we are obviously pleased that they found our work to have considerable potential and merit. We seek to address the main comments from their public review and recommendations below.

      1) It is unclear to what extent the decision threshold is changing between subjects and conditions, how that might affect the empirical integration kernel, and how well these two factors can together explain the overall changes in behavior.

      I would expect that less decay in RARE would have led to more false alarms, higher detection rates, and faster RTs unless the decision threshold also increased (or there was some other additional change to the decision process). The CPP for motor preparatory activity reported in Fig. 5 is also potentially consistent with a change in the decision threshold between RARE and FREQUENT. If the decision threshold is changing, how would that affect the empirical integration kernel? These are important questions on their own and also for interpreting the EEG changes.

      This important comment, alongside the comments of reviewer 1 above, made us carefully consider the effects of changes in decision threshold on the evidence integration kernel via simulation. As discussed above (in response to ‘essential revisions for the authors’), we now include an entirely new section on how changes in decision threshold and leak may affect the evidence integration kernel, and be used to optimise performance across the different sensory environments. In particular, we agree with the reviewer that the motor preparatory activity that differs between RARE and FREQUENT is consistent with a change in decision threshold, and our simulations have suggested that our behavioural findings on evidence integration are also consistent with this change as well. These are detailed on pp.1-4 of the rebuttal, above.

      2) The authors find an interesting difference in the CPP for the FREQUENT vs RARE conditions where they also show differences in the decay time constant from the empirical integration kernel. As mentioned above, I'm wondering what else may be different between these conditions. Do the authors have any leverage in addressing whether the decision threshold differs? What about other factors that could be important for explaining the CPP difference between conditions? Big picture, the change in CPP becomes increasingly interesting the more tightly it can be tied to a particular change in the decision process.

      We fully agree with the spirit of this comment, and we’ve tried much more carefully to consider what the influences of decision threshold and leak would be on our behavioural analyses. As discussed in the response to reviewer 1, we think that the negative-going potential at the time of responses (which is greater in RARE vs. FREQUENT, main figure 7b, and mirrored by equivalent changes in beta desynchronisation, see Reviewer Response Figure 5 above) are both reflective of a change in decision threshold between RARE and FREQUENT conditions. We have tried to make this link explicit in the revised results section:

      “As large changes in mean evidence are less frequent in the RARE condition, the increased neural response to |Devidence| may reflect the increased statistical surprise associated with the same magnitude of change in evidence in this condition. In addition, when making a correct response, preparatory motor activity over central electrodes reached a larger decision threshold for RARE vs. FREQUENT response periods (Figure 7b; p=0.041, cluster-based permutation test). We found similar effects in beta-band desynchronisation prior, averaged over the same electrodes; beta desynchronisation was greater in RARE than FREQUENT response periods. As discussed in the computational modelling section above, this is consistent with the changes in integration kernels between these conditions as it may reflect a change in decision threshold (figure 2d, 3c/d). It is also consistent with the lower detection rates and slower reaction times when response periods are RARE (figure 2 b/c).”

      I'll note that I'm also somewhat skeptical of the statements by the authors that large shifts in evidence are less frequent in the RARE compared to FREQUENT conditions (despite the names) - a central part of their interpretation of the associated CPP change. The FREQUENT condition obviously has more frequent deviations from the baseline, but this is countered to some extent by the experimental design that has reduced the standard deviation of the coherence for these response periods. I think a calculation of overall across-time standard deviation of motion coherence between the RARE and FREQUENT conditions is needed to support these statements, and I couldn't find that calculation reported. The authors could easily do this, so I encourage them to check and report it.

      See Author response image 2.

      3) The wide range of decay time constants between subjects and the correlation of this with another component of the CPP is also interesting. However, in trying to interpret this change in CPP, I'm wondering what else might be changing in the inter-subject behavior. For instance, it looks like there could be up to 4 fold changes in false alarm rates. Are there other changes as well? Do these correlate with the CPP? Similar to my point above, the changes in CPP across subjects become increasingly interesting the more tightly it can be tied to a particular difference in subject behavior. So, I would encourage the authors to examine this in more depth.

      Thanks for the interesting suggestion. We explored whether there might be any interindividual correlation in this measure with the false alarm rate across participants, but found that there was no such correlation. (See Author response image 4; plotting conventions are as in main figure 9).

      Author response image 4.

      No evidence of between-subject correlations in CPP responses and false alarm rates, in any of the four conditions.

      We hope instead that the extended discussion of how the integration kernel should be interpreted (in light of computational modelling) provides at least some increased interpretability of the between-subject effects that we report in figure 9.

      Reviewer #3 (Public Review):

      The main strength is in the task design which is novel and provides an interesting approach to studying continuous evidence accumulation. Because of the continuous nature of the task, the authors design new ways to look at behavioral and neural traces of evidence. The reverse-correlation method looking at the average of past coherence signals enables us to characterize the changes in signal leading to a decision bound and its neural correlate. By varying the frequency and length of the so-called response period, that the participants have to identify, the method potentially offers rich opportunities to the wider community to look at various aspects of decision-making under sensory uncertainty.

      We are pleased that the reviewer agrees with our general approach as a novel way of characterising various aspects of decision-making under uncertainty.

      The main weaknesses that I see lie within the description and rigor of the method. The authors refer multiple times to the time constant of the exponential fit to the signal before the decision but do not provide a rigorous method for its calculation and neither a description of the goodness of the fit. The variable names seem to change throughout the text which makes the argumentation confusing to the reader. The figure captions are incomplete and lack clarity.

      We apologise that some of our original submission was difficult to follow in places, and we are very grateful to the reviewer for their thorough suggestions for how this could be improved. We address these in turn below, and we hope that this answers their questions, and has also led to a significant improvement in the description and rigour of the methodology.

    1. Author Response

      Reviewer #3 (Public Review):

      Dysbiosis has a substantial impact on host physiology. Using the nematode C. elegans and E.coli as a model of host-microbe interactions, Yang et al. defined a mechanism by which the host deals with gut dysbiosis to maintain fitness. They found that accumulation of E. coli in the intestine secreted indole, a tryptophan metabolite, and activated the transcription factor DAF-16. DAF-16 induced the expression of lys-7 and lys-8, which in turn limited E. coli proliferation in the gut of worms and maintained the longevity of worms. Finally, these authors demonstrated that indole-activated DAF-16 via TRPA-1 in neurons of worms.

      This study revealed a new mechanism of host-microbe interaction. The concept of their work is of broad interest and the results they present are convincing. However, there are some issues that need to be addressed to support the conclusions.

      Major issues

      1) The authors isolated the crude extract from a high-performance liquid chromatograph (HPLC). A candidate compound was detected by activity-guided isolation and further identified as indole with mass spectrometry and NMR data. The HPLC fractionations and activity-guided isolation experiments should be described in more detail with a schematic figure to reveal how these experiments were performed and how indole was identified. Showing a chemical characterization of indole in Figure 2A is not sufficient for the evaluation of the results. Rather, a figure comparing the fraction 26th with standard indole by MS and NMR is more appealing.

      We appreciate the concerns of the reviewer. Activity-guided isolation was performed as follows: The crude extract of E. coli supernatant metabolites was divided into 45 fractions according to polarity using Ultimate 3000 HPLC (Thermofisher, Waltham, MA) coupled with automated fraction collector. After freeze-drying each fraction, 1 mg of metabolites were dissolved in DMSO for DAF-16 nuclear localization assay in worms (Please see new Supplementary Table S2). The 26th fraction with DAF-16 nuclear translocation-inducing activity was then separated on silica gel column (200-300 mesh) with a continuous gradient of decreasing polarity (100%, 70%, 50%, 30%, petroleum ether/acetone) to yield four fractions (26a-d). Only the fraction of 26b could induce DAF-16 nuclear translocation. Then the fraction was further separated using a Sephadex LH-20 column to yield 32 fractions. The 26b-11th fraction with DAF-16 nuclear translocation-inducing activity contained a single compound identified by thin layer chromatography, mass spectrometry and nuclear magnetic resonance (NMR). The compound exhibited a quasimolecular ion peak at m/z 181.0782 [M+H]+ in the positive APCI-MS, and was assigned to a molecular formula of C8H7N. A comparison of these 1H NMR and 13C NMR spectra with the data reported in the literature revealed that the compound was indole (Yagudaev, 1986). The figure shows the comparison of the 26b-11 fraction with the standard indole by MS (Author response image 1).

      Author response image 1.

      High resolution mass spectrum of the candidate compound and indole.

      2) DAF-16::GFP was mainly located in the cytoplasm of the intestine in worms expressing daf-16p::daf-16::gfp fed live E. coli OP50 on Day 1 (Figure 1A and 1B). The nuclear translocation of DAF-16 in the intestine was increased in worms fed live E. coli OP50 on Days 4 and 7, but not in age-matched WT worms fed heat-killed (HK) E. coli OP50 (Figure 1A and 1B). Since DAF-16 functions downstream of DAF-2, have the levels of DAF-2 been tested during aging on OP50 and (HK) OP50, or with and without indole supplementation?

      In response to the reviewer’s suggestion, we carried out the RT-PCR experiment in 4-day-old and 7-day-old worms. It has been shown that DAF-2 initiates a kinase cascade that leads to the phosphorylation and cytoplasmic retention of DAF-16. By contrast, a reduction in the DAF-2 signaling leads to the dephosphorylation of DAF-16, allowing its nuclear translocation. In response to the reviewer’s suggestion, we tested the expression of daf-2 in 4-day-old and 7-day-old worms fed with OP50 and (HK) OP50. We found that the mRNA levels of daf-2 were significantly increased in worms on days 4 and 7 in the presence of either live or dead E. coli OP50, compared with those in worms on day 1 (Author response image 2A). In addition, supplementation with indole did not alter the mRNA levels of daf-2 in young adult worms (Author response image 2B). To conclude, the activation of DAF-16 is independent of DAF-2.

      Author response image 2.

      DAF-16 nuclear translocationisindependent of DAF-2.(A) The mRNA levelsof daf-2weregradually increasedin worms with age.P< 0.01;*P< 0.001; ns, not significant. (B)The mRNA levelsof daf-2were not alteredaftertreatment withindole for 24 hours.ns, not significant.

      3) In lines 155-157, the author argued that the increase in the levels of indole in worms results from the intestinal accumulation of live E. coli OP50, rather than exogenous indole produced by E. coli OP50 on the NGM plates. However, the work also showed that supplementation with indole (50-200 μM) could significantly increase the indole levels in young adult worms on Day 1 (Figure 2-figure supplement 3B), which could induce nuclear translocation of DAF-16 in worms (Figure 2B). This result suggested that worms could take in indole from outside culturing environment. The concentration of indole in OP50 and (HK) OP50 could be measured.

      We appreciate the concerns of the reviewer. Reviewer #2 also pointed out this problem. In this study, our data showed that the levels of indole were 30.9, 71.9, and 105.9 nmol/g dry weight in worms fed live E. coli OP50 on days 1, 4, and 7, respectively (Figure 2C). This increase in the levels of indole in worms was accompanied by an increase in CFU of live E. coli OP50 in the intestine of worms with age (Figure 2C). In addition, we determined the levels of indole in worms fed HK E. coli OP50, and found that the levels of indole were 28.2, 31.6, and 36.1 nmol/g dry weight in worms fed HK E. coli OP50 on days 1, 4, and 7, respectively (Figure 2-figure supplement 3A). It should be noted that the levels of indole in worms fed dead E. coli OP50 on day 1 were comparable of those in worms fed live E. coli OP50 on day 1 (30.9 vs 28.2 nmol/g dry weight). However, the levels of indole were not increased in worms fed HK E. coli OP50 on days 4 and 7. Furthermore, the observation that DAF-16 was retained in the cytoplasm of the intestine in worms fed live E. coli OP50 on day 1 (Figure 1A and 1B) also indicated that indole produced by E. coli OP50 on the NGM plates is not enough to induce DAF-16 nuclear translocation. By contrast, supplementation with indole (50-200 μM) significantly increased the indole levels in worms on day 1 (Figure 2-figure supplement 3B), which could induce nuclear translocation of DAF-16 in worms (Figure 2B). Thus, the increase in the levels of indole in worms with age results from intestinal accumulation of live E. coli OP50, rather than indole produced by E. coli OP50 on the NGM plates.

      4) Recent work showed that the multicopy DAF-16 transgene acts differently from the single copy GFP knock in DAF-16 transgene. Which DAF-16 transgene was used in this work?

      The strain we used is TJ356. Its genotype has been described as zIs356 [daf-16p::daf-16a/b::GFP+rol-6(su1006)] (Lee, Hench, & Ruvkun, 2001; Lin, Hsin, Libina, & Kenyon, 2001), from the Caenorhabditis Genetics Center (CGC).

      5) In lines 190-193, the author argued that the supplementation with indole (100 M) inhibited the CFU of E. coli K-12 in WT worms, but not daf-16(mu86) mutants, on Days 4 and 7 (Figure 3H and 3I). These results suggest that endogenous indole is involved in maintaining a normal lifespan in worms. This is overstating. The data here more likely suggest that indole could inhibit the proliferation of E. coli through DAF-16.

      We really appreciate this reviewer’s preciseness. In response to the reviewer’s suggestion, we had changed "...indole is involved in maintaining a normal lifespan in worms" to "...indole produced by bacteria in the gut could inhibit the proliferation of E. coli via DAF-16 in worms".

      6) Sonowal (2017) reported that AHR mediates indole-promoted lifespan extension at 16 C. Yet this work argued that RNAi knockdown of ahr-1 did not affect the nuclear translocation of DAF-16 in worms fed E. coli K12 strain on Day 7 (Figure 4-figure supplement 1A) or young adult worms treated with indole (100 M) for 24 h. The difference between these two works should be discussed.

      We really appreciate this reviewer’s preciseness. It has been shown that AHR-1 mediates indole-promoted lifespan extension in worms at 16 C (Sonowal et al., 2017). However, our data show that AHR-1 is not involved in activation of DAF-16 by indole-induced nuclear translocation of DAF-16 at 20 C. This means that AHR-1 and TRPA-1-lifespan extension by indole are essentially different. In our study, indole is added to NGM plates when worms reached the young adult stage. In the study by Sonowal et al., indole is supplemented at the stage of L1 larva. In addition, lifespan of C. elegans varies at different temperatures (Xiao et al., 2013). Thus, indole may promote lifespan extension via different mechanisms, which is dependent on exposure time and temperature.

      7) Sonowal (2017) conducted mRNA profiling for worms growing on K12 and K12△tnaA. Is TRPA1 in their de-regulated gene list? Have other de-regulated genes been tested in this work?

      We appreciate the concerns of the reviewer. We found that TRPA-1 is not included in the de-regulated gene list. Sonowal et al. focus on the gene expression profiles in worms from L1 larvae to young adults, whereas we pay attention to gene expression profiles in worms from young adults to aged worms. Thus, we did not test the de-regulated genes in their work.

      8) How does indole activate TRPA1? In the absence of trpa1, what is the concentration of indole in worms? Since TRPA1 is a channel, is there any possibility that TRPA1 is involved in the transport of indole? It is really interesting and surprising that neuronal TRPA-1, but not intestinal TRPA-1, mediates the beneficial effect of indole. How does indole specifically activate TRPA-1 in neurons to preserve the longevity of worms?

      We appreciate the concerns of the reviewer. TRPA1 is a nonselective cation channel permeable to Ca2+, Na+, and K+ (Zygmunt & Hogestatt, 2014). It is unlikely that TRPA1 is capable of transporting heterocyclic organic compounds, such as indole.

      In response to the reviewer’s suggestion, we detected the content of indole in trpa-1(ok999) worms. We found that the levels of indole in trpa-1(ok999) worms were slightly increased in worms on days 4 and 7, compared to those in WT worms on days 4 and 7 (Author response image 3).

      Recently, Ye et al. have demonstrated that indole and indole-3-carboxaldehyde (IAld) are agonists of TRPA1, which is conserved in vertebrates (Ye et al., 2021). Thus, it is mostly likely that indole acts as an agonist of TRPA-1 in C. elegans by directly binding to TRPA-1. One possibility is that activation of TRPA-1 in neurons by indole could induce a pathway that release a neurotransmitter, which in turn triggers a signaling pathway to extend lifespan of worms via activating DAF-16 in a non-cell autonomous manner. In contrast, the activation of TRPA-1 in the intestine by indole is unable to release such a neurotransmitter. Indeed, TRPA1 induces the releasing of calcitonin gene-related peptide in perivascular sensory nerves, leading to membrane hyperpolarization and arterial dilation on smooth muscle cells (Talavera et al., 2020). Moreover, the activation of TRPA1 by indole and IAld induces the secretion of the neurotransmitter serotonin in zebrafish (Ye et al., 2021).

      Author response image 3.

      The indole levels in trpa-1 mutants are increased on days 4 and 7, compared with those in WT worms. *P < 0.05.

      9) How neuronal- and intestinal-specific knockdown of trpa-1 by RNAi was conducted? And what is the tissue-specific expression pattern of trap-1? Speculating how indole was transported to neuron cells is pretty appealing.

      We appreciate the concerns of the reviewer. SID-1 is required cell-autonomously for systemic RNAi (Winston, Molodowitch, & Hunter, 2002). Thus, the sid-1 mutants are resistant to RNAi in the neuronal- and intestinal-specific RNAi strains, sid-1 was expressed under control of the neuronal-specific unc-119 and the intestinal-specific vha-6 promoters, respectively. Although it has been reported that TRPA-1 is expressed in neurons, muscles, hypodermal cells, and the intestine, Xiao et al. proved that only TRPA-1 expressed in the intestine and neurons contributes to life extension at low temperature (Xiao et al., 2013). The transporter of indole has not been identified. In Arabidopsis, ATP-binding cassette (ABC) transporter G family 37(ABCG37) has been reported to transport a range of indole derivatives (Ruzicka et al., 2010). However, all fifteen C. elegans ABC transporters share less than 30% sequence identity with ABCG37. Thus, it is impossible to determine which one is the transport channel for indole and indole derivatives in C. elegans.

      10) Supplementation with indole only up-regulated the expression of lys-7 and lys-8 in worms subjected to intestinal-specific (Figure 7-figure supplement 2C), but not neuronal-specific, RNAi of trpa-1 (Figure 7-figure supplement 2D). If this is the case, should the addition of indole specifically induce the expression of lys-7p::gfp or lys-8p::gfp in neurons?

      We really appreciate this reviewer’s preciseness. Indeed, lys-7 and lys-8 are expressed in both neurons and the intestine (Author response image 4A and 7B). However, the expression of lys-8p::gfp and lys-7p::gfp in neurons was not altered in worms after treatment with indole or knockdown of trpa-1 by RNAi (Author response image 4C and 4D).

      Author response image 4.

      The expression of LYS-7 and LYS-8 in neurons is not altered after treatment with indole or knockdown of trpa-1 by RNAi. (A and C) Representative images of lys-7p::gfp (A) and lys-8p::gfp (C). Both lys-7 and lys-8 could be expressed in neurons and the intestine. (B and D) Quantification of fluorescent intensity of lys-7p::gfp (B) and lys-8p::gfp (D) in neurons. These results are means ± SD of three independent experiments. ns, not significant.

      11) The authors demonstrated that K-12△tnaA strain had undetectable tnaA mRNA or indole levels. Furthermore, the deletion of tnaA significantly inhibited the nuclear translocation of DAF-16 in worms. However, mutations in E. coli still have non-specific effects as there are several transposon insertions or polar mutations influencing downstream genes. The authors should demonstrate that only disruption of TnaA causes the failure of nuclear translocation of DAF-16.

      In response to the reviewer’s suggestion, we rescued the expression of tnaA in the K-12 △tnaA strain. As expected, the indole level of from the supernatant in the K12 △tnaA::tnaA strain cultures was 34.1 μmol/L, which was comparable of that in the K12 strain cultures (42.5 μmol/L)(new Figure 2-figure supplement 4D). In addition, DAF-16 nuclear accumulation was increased in worms grown in the K12 △tnaA::tnaA strain on days 4 and 7 (new Figure 2-figure supplement 4E).

    1. Author Response

      Reviewer #1 (Public Review):

      The study by Akter et al demonstrates that astrocyte-derived L-lactate plays a key role in schema memory formation and promotes mitochondrial biogenesis in the Anterior Cingulate Cortex (ACC).

      The main tool used by the authors is the DREADD technology that allows to pharmacologically activate receptors in a cell-specific manner. In the study, the authors used the DREADD technique to activate appropriately transfected astrocytes, a subtype of muscarinic receptor that is not normally present in cells. This receptor being coupled to a Gi-mediated signal transduction pathway inhibiting cAMP formation, the authors could demonstrate cell-(astrocyte) specific decreases in cAMP levels that result in decreased L-lactate production by astrocytes.

      Behaviorally this pharmacological manipulation results in impairments of schema memory formation and retrieval in the ACC in flavor-place paired associate paradigms. Such impairments are prevented by co-administration of L-lactate.

      The authors also show that activation of Gi signaling resulting in L-lactate decreased release by astrocytes impairs mitochondrial biogenesis in neurons in an L-lactate reversible manner.

      By using MCT 2 inhibitors and an NMDAR antagonist the authors conclude that the molecular mechanisms underlying the observed effects are mediated by L-lactate entering neurons through MCT2 transporters and involve NMDAR.

      Overall, the article's conclusions are warranted by the experimental evidence, but some weak points could be addressed which would make the conclusions even stronger.

      The number of animals in some of the experiments is on the low side (4 to 6).

      In the revised manuscript, we have increased the animal numbers in two key experimental groups (hM4Di-CNO and Control groups) of behavioral experiments. Now the animal numbers in different groups are as follows:

      • 15 rats in hM4Di-CNO group

      o Further divided into two subgroups for probe tests (PT1-4) conducted during flavor-place paired associate training; 8 rats in the hM4Di-CNO (saline) and 7 rats in the hM4Di-CNO (CNO) subgroups receiving I.P. saline or I.P. CNO, respectively, before these PTs.

      • 8 rats in the Control group

      • 7 rats in the Rescue group (hM4Di-CNO+L-lactate)

      • 4 rats in the Control-CNO group. Animal number in this group was not increased as it was apparent from these 4 rats that CNO alone was not impairing the PA learning and memory retrieval in these rats (AAV8-GFAP-mCherry injected). Their result was very similar to the control group. Additionally, in a previous study (Liu et al., 2022), we showed that CNO administration in the rats injected with AAV8-GFAP-mCherry into the hippocampus does not show any impairments in schema.

      Also, in the newly added open field test experiments to investigate the locomotor activity as suggested by the Reviewer #2, 8 rats were used in each group.

      The use of CIN to inhibit MCT2 is not optimal. Authors may want to decrease MCT2 expression by using antisense oligonucleotides.

      In the revised manuscript, we have conducted the experiment using MCT2 antisense oligodeoxynucleotide (ODN) as suggested.

      To test whether the L-lactate-induced neuronal mitochondrial biogenesis is dependent on MCT2, we bilaterally injected MCT2 antisense oligodeoxynucleotide (MCT2-ODN, n=8 rats, 2 nmol in 1 μl PBS per ACC) or scrambled ODN (SC-ODN, n=8 rats, 2 nmol in 1 μl PBS per ACC) into the ACC. After 11 hours, bilateral infusion of L-lactate (10 nmol, 1 μl) or ACSF (1 μl) was given into the ACC and the rats were kept in the PA event arena. After 60 mins (12 hours from MCT2-ODN or SC-ODN administration), the rats were sacrificed. As shown in Author response image 1B, SC-ODN+L-lactate group showed significantly increased relative mtDNA copy number compared to the SC-ODN+ACSF group (p<0.001, ANOVA followed by Tukey's multiple comparisons test). However, this effect was completely abolished in MCT2-ODN+L-lactate group, suggesting that MCT2 is required for the L-lactate-induced mitochondrial biogenesis in the ACC.

      We have integrated this new data and results in the revised manuscript.

      Author response image 1.

      Mitochondrial biogenesis by L-lactate is dependent on MCT2 and NMDAR. A. Experimental design to investigate whether MCT2 and NMDAR activity are required for L-lactate-induced mitochondrial biogenesis. B and C. mtDNA copy number abundance in the ACC of different rat groups relative to nDNA. Data shown as mean ± SD (n=4 rats in each group). ***p<0.001, ANOVA followed by Tukey's multiple comparisons test.

      The experiment using AVP to block NMDAR only partially supports the conclusions. Indeed, blocking NMDAR will knock down any response that involves these receptors, whether L-lactate is necessary or not.

      In the current study we found that Astrocytic Gi activation in the ACC reduced L-lactate level in the ECF of ACC which was also associated with decreased PGC-1α/SIRT3/ATPB/mtDNA abundance suggesting downregulation of mitochondrial biogenesis pathway. We also found that exogenous administration of L-lactate into the ACC of astrocytic Gi-activated rats rescued this downregulation. In line with this, in a recently published study (Akter et al., 2023), we found upregulation of mitochondrial biogenesis pathway in the hippocampus neurons of exogenous L-lactate-treated anesthetized rats. Another recent study has demonstrated that exercise-induced L-lactate release from skeletal muscle or I.P. injection of L-lactate can induce hippocampal PGC-1α (which is a master regulator of mitochondrial biogenesis) expression and mitochondrial biogenesis in mice (Park et al., 2021). Together, these results provide compelling evidence that L-lactate promotes mitochondrial biogenesis.

      L-lactate is known to promote expression of synaptic plasticity genes like Arc, c-Fos, and Zif268 in neurons (Yang et al., 2014). After entry into the neuronal cytoplasm, mainly through MCT2, it is converted into pyruvate by lactate dehydrogenase 1 (LDH1). This conversion also produces NADH, affecting the redox state of the neuron. NADH positively modulates the activity of NMDAR resulting in enhanced Ca2+ currents, the activation of intracellular signaling cascades, and the induction of the expression of plasticity-associated genes (Yang et al., 2014; Magistretti & Allaman, 2018). The study demonstrated that L-lactate–induced plasticity gene expression was abolished in the presence of NMDAR antagonists including D-APV (Yang et al., 2014). These results suggested that the MCT2 and NMDAR are key players in the regulation of L-lactate induced plasticity gene expression.

      In the current study, we investigated whether similar mechanisms might be involved in L-lactate-induced neuronal mitochondrial biogenesis. We now used MCT2 antisense oligodeoxynucleotide to decrease the expression of MCT2 (as mentioned in the previous response and Author response image 1B) and showed that MCT2 is necessary for L-lactate-induced mitochondrial biogenesis to manifest, indicating that L-lactate’s entry into the neuron is required. As mentioned before, after entry into neuron, L-lactate is converted into pyruvate by LDH, which also produce NADH, which in turn potentiates NMDAR activity. Therefore, we investigated whether NMDAR activity is required for L-lactate-induced mitochondrial biogenesis. We used D-APV to inhibit NMDAR (Author response image 1C) and found that L-lactate does not increase mtDNA copy number abundance if D-APV is given, suggesting that NMDAR activity is required for L-lactate to promote mitochondrial biogenesis.

      NMDAR serves diverse functions. Therefore, as mentioned by the reviewer, blocking NMDAR may knock down many such functions. While our current data only suggests the involvement of MCT2 and NMDAR in the upregulation of mitochondrial biogenesis by L-lactate, we have not investigated other mechanisms and pathways modulating mitochondrial biogenesis that are either dependent or independent of MCT2 and NMDAR activity. Further studies are needed in future to dissect and better understand this interesting observation. We have now clarified this in the discussion section of the manuscript.

      Is inhibition of glycogenolysis involved in the observed effects mediated by Gi signaling? Indeed, L-lactate is formed both by glycolysis and glycogenolysis. The authors could test whether the glycogen metabolism-inhibiting drug DAB would mimic the effects of Gi activation.

      In this study we have shown that astrocytic Gi activation in the ACC leads to a decrease in the cAMP and L-lactate. L-lactate is produced by glycogenolysis and glycolysis. cAMP in astrocytes acts as a trigger for L-lactate production (Choi et al., 2012; Horvat, Muhič, et al., 2021; Horvat, Zorec, et al., 2021; Zhou et al., 2021) by promoting glycogenolysis and glycolysis (Vardjan et al., 2018; Horvat, Muhič, et al., 2021; Horvat, Zorec, et al., 2021). Therefore, one promising explanation of reduced L-lactate level observed in our study is the reduction of L-lactate production in the astrocyte due to decreased glycogen metabolism as a result of decreased cAMP. We have now mentioned this in the discussion.

      DAB is an inhibitor of glycogen phosphorylase that suppresses L-lactate production. It was shown to impair memory by decreasing L-lactate (Newman et al., 2011; Suzuki et al., 2011; Iqbal et al., 2023). As we found that the impairment in the schema memory and mitochondrial biogenesis was associated with decreased L-lactate level in the ACC and that the exogenous L-lactate administration can rescue the impairments, it is likely that DAB will mimic the effect of Gi activation in terms of schema memory and mitochondrial biogenesis. However, further study is needed to confirm this.  

      Reviewer #2 (Public Review):

      The manuscript of Akter et al is an important study that investigates the role of astrocytic Gi signaling in the anterior cingulate cortex in the modulation of extracellular L-lactate level and consequently impairment in flavor-place associates (PA) learning. However, whereas some of the behavioral observations and signaling mechanism data are compelling, the conclusions about the effect on memory are inadequate as they rely on an experimental design that does not allow to differentiate acute or learning effect from the effect outlasting pharmacological treatments, i.e. effect on memory retention. With the addition of a few experiments, this paper would be of interest to the larger group of researchers interested in neuron-glia interactions during complex behavior.

      • Largely, I agree with the authors' conclusion that activating Gi signaling in astrocytes impairs PA learning, however, the effect on memory retrieval is not that obvious. All behavioral and molecular signaling effects described in this study are obtained with the continuous presence of CNO, therefore it is not possible to exclude the acute effect of Gi pathway activation in astrocytes. What will happen with memory on retrieval test when CNO is omitted selectively during early, middle, or late session blocks of PA learning?

      We have now added 8 more rats to the hM4Di-CNO group (i.e., the group with astrocytic Gi activation) to clarify the memory retrieval. These rats underwent flavor-place paired associate (PA) training similar to the previously described rats (n=7) of this group, that is they received CNO 30 minutes before and 30 minutes after the PA training sessions (S1-2, S4-8, S10-17). However, contrasting to the previous rats of this group which received CNO before PTs (PT1, PT2, PT3), we omitted the CNO (instead administered I.P. saline) selectively on these PTs conducted at the early, middle, and late stage of PA training, as suggested by the reviewer. These newly added rats did not show memory retrieval in these PTs, suggesting that the rats were not learning the PAs from the PA training sessions. See Author response image 2C-E, where this subgroup is denoted as hM4Di-CNO (Saline).

      We then continued more PA training sessions (S21 onwards, Author response image 2B) for these rats without CNO. They gradually learned the PAs. PTs (PT5, PT6, PT7; Author response image 2G-I) were done during this continuation phase of PA training; once without CNO (i.e., with I.P. saline instead), and another one with CNO. As seen in the Author response image 2H and 2I, they retrieved the memory when PT6 and PT7 were done without CNO. However, if these PTs were done with CNO, they could not retrieve the memory. Together these results suggest that ACC astrocytic Gi activation by CNO during PT can impair memory retrieval in rats which have already learned the PAs.

      As shown in the Author response image 2B, we replaced two original PAs with two new PAs (NPA 9 and 10) at S34. This was followed by PT8 (S35). As seen in Author response image 2J, these rats retrieved the NPA memory if the PT is done without CNO. However, they could not retrieve the NPA memory if the PT was done with CNO. This result suggests that ACC astrocytic Gi activation by CNO during PT can impair NPA memory retrieval.

      In summary, these data show that astrocytic Gi activation in the ACC can impair PA memory retrieval. We have integrated this new data and results in the revised manuscript.

      Author response image 2.

      A. PI (mean ± SD) during the acquisition of the six original PAs (OPAs) (S1-2, 4-8, 10-17) and new PAs (NPAs) (S19) of the control (n=8), hM4Di-CNO (n=15), and rescue (hM4Di-CNO+L-lactate) (n=7) groups. From S6 onwards, hM4Di-CNO group consistently showed lower PI compared to control. However, concurrent L-lactate administration into the ACC (rescue group) can rescue this impairment. B. PI (mean ± SD) of hM4Di-CNO group (n=8) from S21 onwards showing gradual increase in PI when CNO was withdrawn. C, D, and E. Non-rewarded PTs (PT1, PT2, and PT3 conducted on S3, S9, and S18, respectively) to test memory retrieval of OPAs for the control, hM4Di-CNO, and rescue groups. The percentage of digging time at the cued location relative to that at the non-cued locations are shown (mean ± SD). In both PT2 and PT3, the control group spent significantly more time digging the cued sand well above the chance level, indicating that the rats learned OPAs and could retrieve it. Contrasting to this, hM4Di-CNO group did not spend more time digging the cued sand well above the chance level irrespective of CNO administration before the PTs. The rescue group showed results similar to the hM4Di-CNO group if CNO is given without L-lactate. On the other hand, they showed results similar to the control group if L-lactate is concurrently given with CNO, indicating that this group learned OPAs and could retrieve it. p < 0.05, p < 0.01, p < 0.001, one-sample t-test comparing the proportion of digging time at the cued sand well with the chance level of 16.67%. F. Non-rewarded PT4 (S20) which was conducted after replacing two OPAs with two NPAs (NPA 7 & 8) in S19 for the control, hM4Di-CNO, and rescue groups. Results show that the control group spent significantly more time digging the new cued sand well above the chance level indicating that the rats learned the NPAs from S19 and could retrieve it in this PT. Contrasting to this, hM4Di-CNO group did not spend more time digging the new-cued sand well above the chance level irrespective of CNO administration before the PT. The rescue group showed results similar to the hM4Di-CNO group if CNO is given without L-lactate. On the other hand, they showed results similar to the control group if L-lactate is concurrently given with CNO indicating that this group learned NPAs from S19 and could retrieve it. p < 0.001, one-sample t-test comparing the proportion of digging time at the new cued sand well with the chance level of 16.67%. G, H, and I. Non-rewarded PTs (PT5, PT6, and PT7 conducted on S23, S27, and S33, respectively) to test memory retrieval of OPAs for the hM4Di-CNO group. In both PT6 and PT7, the rats spent significantly more time digging the cued sand well above the chance level if the tests are done without CNO, indicating that the rats learned the OPAs and could retrieve it. However, CNO prevented memory retrieval during these PTs. p < 0.001, one-sample t-test comparing the proportion of digging time at the cued sand well with the chance level of 16.67%. J. Non-rewarded PT4 (S35) which was conducted after replacing two OPAs with two NPAs (NPA 9 & 10) in S34 for the hM4Di-CNO group. Results show that the rats spent significantly more time digging the new cued sand well above the chance level if CNO was not given before the PT, indicating that the rats learned the NPAs from S34 and could retrieve it in this PT. However, if CNO is given before the PT, the retrieval is impaired. *p < 0.001, one-sample t-test comparing the proportion of digging time at the new cued sand well with the chance level of 16.67%.

      • I found it truly exciting that the administration of exogenous L-lactate is capable to rescue CNO-induced PA learning impairment, when co-applied. Would it be possible that this treatment has a sensitivity to a particular stage of learning (acquisition, consolidation, or memory retrieval) when L-lactate administration would be the most efficacious?

      The hM4Di-CNO group, when continued with PA training without CNO (S21-S32) (Author response image 2B), was able to learn the six original PAs (OPAs). In the PT7 done at S33 (Author response image 2I), this group of rats was able to retrieve the memory if the test was done without CNO but could not retrieve the memory if CNO was given. Similarly, the Rescue group (hM4Di-CNO+L-lactate) (Author response image 2A), which received both CNO and L-lactate during PA training sessions (S1-S17), they were able to learn the OPAs. And at PT3 done at S18 (Author response image 2E), these rats were able to retrieve the memory when the test was done with CNO+L-lactate but not if the test is done with only CNO. Together, these results clearly show that ACC astrocytic Gi activation with CNO impairs memory retrieval and exogenous L-lactate can rescue the impairment. Therefore, it can be concluded that the memory retrieval is sensitive to L-lactate.

      The PA learning is hippocampus-dependent. Over the course of repeated PA training, systems consolidation occurs in the ACC, after which the already learned PA memory (schema) becomes hippocampus-independent (Tse et al., 2007; Tse et al., 2011). A higher activation (indicated by expression of c-Fos) in the hippocampus relative to the ACC during the early period of schema development, and the reverse at the late stage was observed in our previous study (Liu et al., 2022). However, rapid assimilation of new PA into the ACC requires simultaneous activation/retrieval of previous schema from ACC and hippocampus dependent new PA learning (Tse et al., 2007; Tse et al., 2011). During new PA learning, increase of c-Fos neurons in both CA1 and ACC was detected (Liu et al., 2022).

      Our hM4Di-CNO group received CNO 30 mins before and after each PA training session in S1-S17 (Author response image 2A). Also, the Rescue group similarly received CNO+L-lactate before and after each PA training session in S1-S17. Therefore, while this study design allowed us to conclude that ACC astrocytic Gi activation impairs PA learning and that exogenous L-lactate can rescue the impairment, it does not allow clear differentiation of the effects of these treatments on memory acquisition and consolidation. Further studies are needed to investigate this.

      • The hypothesis that observed learning impairments could be associated with diminished mitochondrial biogenesis caused by decreased l-lactate in the result of astrocytic Gi-DREADDS stimulation is very appealing, but a few key pieces of evidence are missing. So far, the hypothesis is supported by experiments demonstrating reduced expression of several components of mitochondrial membrane ATP synthase and a decrease in relative mtDNA copy numbers in ACC of rats injected with Gi-DREADDs. L-lactate injections into ACC restored and even further increased the expression of the above-mentioned markers. Co-administration of NMDAR antagonist D-APV or MCT-2 (mostly neuronal) blocker 4-CIN with L-lactate, prevented L-lactate-induced increase in relative mtDNA copy. I am wondering how the interference with mitochondrial biogenesis is affecting neuronal physiology and if it would result in impaired PA learning or schema memory.

      The observation of diminished mitochondrial biogenesis in the astrocytic Gi-activated rats that showed impaired PA learning is exciting. However, our study does not provide experimental data on how mitochondrial biogenesis could be associated with impaired PA learning and schema memory. Results from several previous studies linked mitochondrial biogenesis and its regulators such as PGC-1α and SIRT3 to diverse neuronal and cognitive functions as described in the discussion section of the manuscript. In the revised manuscript, we have provided further discussion as follows to discuss potential mechanisms:

      “In this study, we have demonstrated that ACC astrocytic Gi activation impairs PA learning and schema formation, PA memory retrieval, and NPA learning and retrieval by decreasing L-lactate level in the ACC. Although we have shown that these impairments are associated with diminished expression of proteins of mitochondrial biogenesis, the precise mechanisms of how astrocytic Gi activation affects neuronal functions and schema memory remain to be elucidated. We previously demonstrated that neuronal inhibition in either the hippocampus or the ACC impairs PA learning and schema formation (Hasan et al., 2019). In another recent study (Liu et al., 2022), we showed that astrocytic Gi activation in the CA1 impaired PA training-associated CA1-ACC projecting neuronal activation. Yao et al. recently showed that reduction of astrocytic lactate dehydrogenase A (an enzyme that reversibly catalyze L-lactate production from pyruvate) in the dorsomedial prefrontal cortex reduces L-lactate levels and neuronal firing frequencies, promoting depressive-like behaviors in mice (Yao et al., 2023). These impairments could be rescued by L-lactate infusion. It is possible that the impairment in PA learning and schema observed in our study might have involved a similar functional consequence of reduced neuronal activity in the ACC neurons upon astrocytic Gi activation.

      Schema consolidation is associated with synaptic plasticity-related gene expression (such as Zif268, Arc) in the ACC (Tse et al., 2011). L-lactate, after entry into neurons, can be converted to pyruvate during which NADH is also produced, promoting synaptic plasticity-related gene expression by potentiating NMDA signaling in neurons (Yang et al., 2014; Margineanu et al., 2018). Furthermore, L-lactate acts as an energy substrate to fuel learning-induced de novo neuronal translation critical for long-term memory (Descalzi et al., 2019). On the other hand, mitochondria play crucial role in fueling local translation during synaptic plasticity (Rangaraju et al., 2019). Therefore, it could be hypothesized that the rescue of astrocytic Gi activation-mediated impairment of schema by exogenous L-lactate could have been mediated by facilitating synaptic plasticity-related gene expression by directly fueling the protein translation, potentiating NMDA signaling, as well as increasing mitochondrial capacity for ATP production by promoting mitochondrial biogenesis. Furthermore, the potential involvement of HCAR1, a receptor for L-lactate that may regulate neuronal activity (Bozzo et al., 2013; Tang et al., 2014; Herrera-López & Galván, 2018; Abrantes et al., 2019), cannot be excluded. Future research could explore these potential mechanisms, examining the interactions among them, and determining their relative contributions to schema. Our previous study also showed that ACC myelination is necessary for PA learning and schema formation, and that repeated PA training is associated with oligodendrogenesis in the ACC (Hasan et al., 2019). Oligodendrocytes facilitate fast, synchronized, and energy efficient transfer of information by wrapping axons in myelin sheath. Furthermore, they supply axons with glycolysis products, such as L-lactate, to offer metabolic support (Fünfschilling et al., 2012; Lee et al., 2012). The association of oligodendrogenesis and myelination with schema memory may suggest an adaptive response of oligodendrocytes to enhance metabolic support and neuronal energy efficiency during PA learning. Given the impairments in PA learning observed in the ACC astrocytic Gi-activated rats in the current study, it is reasonable to conclude that the direct metabolic support to axons provided by oligodendrocytes is not sufficient to rescue the schema impairments caused by decreased L-lactate levels upon astrocytic Gi activation. On the other hand, L-lactate was shown to be important for oligodendrogenesis and myelination (Sánchez-Abarca et al., 2001; Rinholm et al., 2011; Ichihara et al., 2017). Therefore, it is tempting to speculate that a decrease in L-lactate level may also impede oligodendrogenesis and myelination, consequently preventing the enhanced axonal support provided by oligodendrocytes and myelin during schema learning. Recently, a study has demonstrated that upon demyelination, mitochondria move from the neuronal cell body to the demyelinated axon (Licht-Mayer et al., 2020). Enhancement of this axonal response of mitochondria to demyelination, by targeting mitochondrial biogenesis and mitochondrial transport from the cell body to axon, protects acutely demyelinated axons from degeneration. Given the connection between schema and increased myelination, it remains an open question whether L-lactate-induced mitochondrial biogenesis plays a beneficial role in schema through a similar mechanism. Nevertheless, our results contribute to the mounting evidence of the glial role in cognitive functions and underscores the new paradigm in which glial cells are considered as integral players in cognitive functions alongside neurons. Disruption of neurons, myelin, or astrocytes in the ACC can disrupt PA learning and schema memory.”

      Reviewer #3 (Public Review):

      Akter et al. investigated how the astroglial Gi signaling pathway in the rat anterior cingulate cortex (ACC) affects cognitive functions, in particular schema memory formation. Using a stereotactic approach they intracranially introduced AAV8 vectors carrying mCherry-tagged hM4Di DREADD (Designer Receptor Exclusively Activated by Designer Drugs) under astrocyte selective GFAP promotor (AAV8-GFAP-hM4Di-mCherry) into the AAC region of the rat brain. hM4Di DREADD is a genetically modified form of the human M4 muscarinic (hM4) receptor insensitive to endogenous acetylcholine but is activated by the inert clozapine metabolite clozapine-N-oxide (CNO), triggering the Gi signaling pathway. The authors confirmed that hM4Di DREADD is selectively expressed in astrocytes after the application of the AAV8 vector by analysing the mCherry signals and immunolabeling of astrocytes and neurons in the ACC region of the rat brain. They activated hM4Di DREADD (Gi signalling) in astrocytes by intraperitoneal administration of CNO and measured cognitive functions in animals after CNO administration. Activation of Gi signaling in astrocytes by CNO application decreased paired-associate (PA) learning, schema formation, and memory retrieval in tested animals. This was associated with a decrease in cAMP in astrocytes and L-lactate in extracellular fluid as measured by immunohistochemistry in situ and in awake rats by microdialysis, respectively. Administration of exogenous L-lactate rescued the astroglial Gi-mediated deficits in PA learning, memory retrieval, and schema formation, suggesting that activation of astroglial Gi signalling downregulates L-lactate production in astrocytes and its transport to neurons affecting memory formation. Authors also show that expression level of proteins involved in mitochondrial biogenesis, which is associated with cognitive functions, is decreased in neurons, when Gi signalling is activated in astrocytes, and rescued when exogenous L-lactate is applied, suggesting the implication of astrocyte-derived L-lactate in the maintenance of mitochondrial biogenesis in neurons. The latter depended on lactate MCT2 transporter activity and glutamate NMDA receptor activity.

      The paper is very well written and discussed. The conclusions of this paper are well supported by the data. Although this is a study that uses established and previously published methodologies, it provides new insights into L-lactate signalling in the brain, particularly in AAC, and further confirms the role of astroglial L-lactate in learning and memory formation. It also raises new questions about the molecular mechanisms underlying astrocyte-derived L-lactate-mediated mitochondrial biogenesis in neurons and its contribution to schema memory formation.

      • The authors discuss astrocytic L-lactate signalling without considering the recently discovered L-lactate-sensitive Gs and Gi protein-coupled receptors in the brain, which are present in both astrocytes and neurons. The use of nonendogenous L-lactate receptor agonists (Compound 2, 3-chloro-5-hydroxybenzoic acid) would clarify the implication of L-lactate receptor signalling in schema memory formation.

      In the revised manuscript, we have included this point in the discussion section to mention the potential role of HCAR1 in schema memory as follows:

      “Schema consolidation is associated with synaptic plasticity-related gene expression (such as Zif268, Arc) in the ACC (Tse et al., 2011). L-lactate, after entry into neurons, can be converted to pyruvate during which NADH is also produced, promoting synaptic plasticity-related gene expression by potentiating NMDA signaling in neurons (Yang et al., 2014; Margineanu et al., 2018). Furthermore, L-lactate acts as an energy substrate to fuel learning-induced de novo neuronal translation critical for long-term memory (Descalzi et al., 2019). On the other hand, mitochondria play crucial role in fueling local translation during synaptic plasticity (Rangaraju et al., 2019). Therefore, it could be hypothesized that the rescue of astrocytic Gi activation-mediated impairment of schema by exogenous L-lactate could have been mediated by facilitating synaptic plasticity-related gene expression by directly fueling the protein translation, potentiating NMDA signaling, as well as increasing mitochondrial capacity for ATP production by promoting mitochondrial biogenesis. Furthermore, the potential involvement of HCAR1, a receptor for L-lactate that may regulate neuronal activity (Bozzo et al., 2013; Tang et al., 2014; Herrera-López & Galván, 2018; Abrantes et al., 2019), cannot be excluded. Future research could explore these potential mechanisms, examining the interactions among them, and determining their relative contributions to schema.”

      • The use of control animals transduced with an "empty" AAV9 vector (AAV8-GFAP-mCherry) compared with animals transduced with AAV8-GFAP-hM4Di-mCherry throughout the study would strengthen the results of this study, since transfection itself, as well as overexpression of the mCherry protein, may affect cell function.

      We thank the reviewer for pointing this. The schema experiment includes a control group (Control-CNO group) of rats injected with AAV8-GFAP-mCherry bilaterally into the ACC. As shown in Author response image 3, after habituation and pretraining, these rats were trained for PA learning similarly to the other groups. Before 30 mins and after 30 mins of each PA training session, they received I.P. CNO. The PA learning, schema formation, memory retrieval, NPA learning and retrieval, and latency (time needed to commence digging at the correct well) were similar to the control group of rats. This result is consistent with our previous study where rats bilaterally injected with AAV8-GFAP-mCherry into CA1 of hippocampus did not show impairments in PA learning and schema formation upon CNO treatment (Liu et al., 2022).

      Author response image 3.

      A. PI (mean ± SD) during the acquisition of the original six PAs (OPAs) (S1-2, 4-8, 10-17) and new PAs (NPAs) (S19) of the control (n=6) and control-CNO (n=4) groups. B. Non-rewarded PTs (PT1, PT2, and PT3 done on S3, S9, and S18, respectively) to test memory retrieval of OPAs for the control-CNO group. C. Non-rewarded PT4 (S20) which was done after replacing two OPAs with two NPAs (NPA 7 & 8) in S19 for the control-CNO group. D. Latency (in seconds) before commencing digging at the correct well for control and control-CNO groups. Data shown as mean ± SD.

      References

      Abrantes, H. d. C., Briquet, M., Schmuziger, C., Restivo, L., Puyal, J., Rosenberg, N., Rocher, A.-B., Offermanns, S., & Chatton, J.-Y. (2019). The Lactate Receptor HCAR1 Modulates Neuronal Network Activity through the Activation of Gα and Gβγ Subunits. The Journal of Neuroscience, 39(23), 4422-4433. https://doi.org/10.1523/jneurosci.2092-18.2019

      Akter, M., Ma, H., Hasan, M., Karim, A., Zhu, X., Zhang, L., & Li, Y. (2023). Exogenous L-lactate administration in rat hippocampus increases expression of key regulators of mitochondrial biogenesis and antioxidant defense [Original Research]. Frontiers in Molecular Neuroscience, 16. https://doi.org/10.3389/fnmol.2023.1117146

      Bozzo, L., Puyal, J., & Chatton, J.-Y. (2013). Lactate Modulates the Activity of Primary Cortical Neurons through a Receptor-Mediated Pathway. PLoS One, 8(8), e71721. https://doi.org/10.1371/journal.pone.0071721

      Choi, H. B., Gordon, G. R., Zhou, N., Tai, C., Rungta, R. L., Martinez, J., Milner, T. A., Ryu, J. K., McLarnon, J. G., Tresguerres, M., Levin, L. R., Buck, J., & MacVicar, B. A. (2012). Metabolic communication between astrocytes and neurons via bicarbonate-responsive soluble adenylyl cyclase. Neuron, 75(6), 1094-1104. https://doi.org/10.1016/j.neuron.2012.08.032

      Covelo, A., Eraso-Pichot, A., Fernández-Moncada, I., Serrat, R., & Marsicano, G. (2021). CB1R-dependent regulation of astrocyte physiology and astrocyte-neuron interactions. Neuropharmacology, 195, 108678. https://doi.org/https://doi.org/10.1016/j.neuropharm.2021.108678

      Descalzi, G., Gao, V., Steinman, M. Q., Suzuki, A., & Alberini, C. M. (2019). Lactate from astrocytes fuels learning-induced mRNA translation in excitatory and inhibitory neurons. Communications Biology, 2(1), 247. https://doi.org/10.1038/s42003-019-0495-2

      Endo, F., Kasai, A., Soto, J. S., Yu, X., Qu, Z., Hashimoto, H., Gradinaru, V., Kawaguchi, R., & Khakh, B. S. (2022). Molecular basis of astrocyte diversity and morphology across the CNS in health and disease. Science, 378(6619), eadc9020. https://doi.org/10.1126/science.adc9020

      Fünfschilling, U., Supplie, L. M., Mahad, D., Boretius, S., Saab, A. S., Edgar, J., Brinkmann, B. G., Kassmann, C. M., Tzvetanova, I. D., Möbius, W., Diaz, F., Meijer, D., Suter, U., Hamprecht, B., Sereda, M. W., Moraes, C. T., Frahm, J., Goebbels, S., & Nave, K.-A. (2012). Glycolytic oligodendrocytes maintain myelin and long-term axonal integrity. Nature, 485(7399), 517-521. https://doi.org/10.1038/nature11007

      Harris, R. A., Lone, A., Lim, H., Martinez, F., Frame, A. K., Scholl, T. J., & Cumming, R. C. (2019). Aerobic Glycolysis Is Required for Spatial Memory Acquisition But Not Memory Retrieval in Mice. eNeuro, 6(1). https://doi.org/10.1523/ENEURO.0389-18.2019

      Hasan, M., Kanna, M. S., Jun, W., Ramkrishnan, A. S., Iqbal, Z., Lee, Y., & Li, Y. (2019). Schema-like learning and memory consolidation acting through myelination. FASEB J, 33(11), 11758-11775. https://doi.org/10.1096/fj.201900910R

      Herrera-López, G., & Galván, E. J. (2018). Modulation of hippocampal excitability via the hydroxycarboxylic acid receptor 1. Hippocampus, 28(8), 557-567. https://doi.org/https://doi.org/10.1002/hipo.22958

      Horvat, A., Muhič, M., Smolič, T., Begić, E., Zorec, R., Kreft, M., & Vardjan, N. (2021). Ca2+ as the prime trigger of aerobic glycolysis in astrocytes. Cell Calcium, 95, 102368. https://doi.org/https://doi.org/10.1016/j.ceca.2021.102368

      Horvat, A., Zorec, R., & Vardjan, N. (2021). Lactate as an Astroglial Signal Augmenting Aerobic Glycolysis and Lipid Metabolism [Review]. Frontiers in Physiology, 12. https://doi.org/10.3389/fphys.2021.735532

      Ichihara, Y., Doi, T., Ryu, Y., Nagao, M., Sawada, Y., & Ogata, T. (2017). Oligodendrocyte Progenitor Cells Directly Utilize Lactate for Promoting Cell Cycling and Differentiation. J Cell Physiol, 232(5), 986-995. https://doi.org/10.1002/jcp.25690

      Iqbal, Z., Liu, S., Lei, Z., Ramkrishnan, A. S., Akter, M., & Li, Y. (2023). Astrocyte L-Lactate Signaling in the ACC Regulates Visceral Pain Aversive Memory in Rats. Cells, 12(1), 26. https://www.mdpi.com/2073-4409/12/1/26

      Jourdain, P., Rothenfusser, K., Ben-Adiba, C., Allaman, I., Marquet, P., & Magistretti, P. J. (2018). Dual action of L-Lactate on the activity of NR2B-containing NMDA receptors: from potentiation to neuroprotection. Sci Rep, 8(1), 13472. https://doi.org/10.1038/s41598-018-31534-y

      Kofuji, P., & Araque, A. (2021). G-Protein-Coupled Receptors in Astrocyte-Neuron Communication. Neuroscience, 456, 71-84. https://doi.org/10.1016/j.neuroscience.2020.03.025

      Lee, Y., Morrison, B. M., Li, Y., Lengacher, S., Farah, M. H., Hoffman, P. N., Liu, Y., Tsingalia, A., Jin, L., Zhang, P. W., Pellerin, L., Magistretti, P. J., & Rothstein, J. D. (2012). Oligodendroglia metabolically support axons and contribute to neurodegeneration. Nature, 487(7408), 443-448. https://doi.org/10.1038/nature11314

      Licht-Mayer, S., Campbell, G. R., Canizares, M., Mehta, A. R., Gane, A. B., McGill, K., Ghosh, A., Fullerton, A., Menezes, N., Dean, J., Dunham, J., Al-Azki, S., Pryce, G., Zandee, S., Zhao, C., Kipp, M., Smith, K. J., Baker, D., Altmann, D., Anderton, S. M., Kap, Y. S., Laman, J. D., Hart, B. A. t., Rodriguez, M., Watzlawick, R., Schwab, J. M., Carter, R., Morton, N., Zagnoni, M., Franklin, R. J. M., Mitchell, R., Fleetwood-Walker, S., Lyons, D. A., Chandran, S., Lassmann, H., Trapp, B. D., & Mahad, D. J. (2020). Enhanced axonal response of mitochondria to demyelination offers neuroprotection: implications for multiple sclerosis. Acta Neuropathologica, 140(2), 143-167. https://doi.org/10.1007/s00401-020-02179-x

      Liu, S., Wong, H. Y., Xie, L., Iqbal, Z., Lei, Z., Fu, Z., Lam, Y. Y., Ramkrishnan, A. S., & Li, Y. (2022). Astrocytes in CA1 modulate schema establishment in the hippocampal-cortical neuron network. BMC Biol, 20(1), 250. https://doi.org/10.1186/s12915-022-01445-6

      Magistretti, P. J., & Allaman, I. (2018). Lactate in the brain: from metabolic end-product to signalling molecule. Nat Rev Neurosci, 19(4), 235-249. https://doi.org/10.1038/nrn.2018.19

      Margineanu, M. B., Mahmood, H., Fiumelli, H., & Magistretti, P. J. (2018). L-Lactate Regulates the Expression of Synaptic Plasticity and Neuroprotection Genes in Cortical Neurons: A Transcriptome Analysis. Front Mol Neurosci, 11, 375. https://doi.org/10.3389/fnmol.2018.00375

      Netzahualcoyotzi, C., & Pellerin, L. (2020). Neuronal and astroglial monocarboxylate transporters play key but distinct roles in hippocampus-dependent learning and memory formation. Progress in Neurobiology, 194, 101888. https://doi.org/https://doi.org/10.1016/j.pneurobio.2020.101888

      Newman, L. A., Korol, D. L., & Gold, P. E. (2011). Lactate produced by glycogenolysis in astrocytes regulates memory processing. PLoS One, 6(12), e28427. https://doi.org/10.1371/journal.pone.0028427

      Park, J., Kim, J., & Mikami, T. (2021). Exercise-Induced Lactate Release Mediates Mitochondrial Biogenesis in the Hippocampus of Mice via Monocarboxylate Transporters. Front Physiol, 12, 736905. https://doi.org/10.3389/fphys.2021.736905

      Peterson, S. M., Pack, T. F., & Caron, M. G. (2015). Receptor, Ligand and Transducer Contributions to Dopamine D2 Receptor Functional Selectivity. PLoS One, 10(10), e0141637. https://doi.org/10.1371/journal.pone.0141637

      Rangaraju, V., Lauterbach, M., & Schuman, E. M. (2019). Spatially Stable Mitochondrial Compartments Fuel Local Translation during Plasticity. Cell, 176(1), 73-84.e15. https://doi.org/10.1016/j.cell.2018.12.013

      Rinholm, J. E., Hamilton, N. B., Kessaris, N., Richardson, W. D., Bergersen, L. H., & Attwell, D. (2011). Regulation of oligodendrocyte development and myelination by glucose and lactate. J Neurosci, 31(2), 538-548. https://doi.org/10.1523/JNEUROSCI.3516-10.2011

      Sánchez-Abarca, L. I., Tabernero, A., & Medina, J. M. (2001). Oligodendrocytes use lactate as a source of energy and as a precursor of lipids. Glia, 36(3), 321-329. https://doi.org/10.1002/glia.1119

      Suzuki, A., Stern, S. A., Bozdagi, O., Huntley, G. W., Walker, R. H., Magistretti, P. J., & Alberini, C. M. (2011). Astrocyte-neuron lactate transport is required for long-term memory formation. Cell, 144(5), 810-823.

      Tang, F., Lane, S., Korsak, A., Paton, J. F. R., Gourine, A. V., Kasparov, S., & Teschemacher, A. G. (2014). Lactate-mediated glia-neuronal signalling in the mammalian brain. Nature Communications, 5(1), 3284. https://doi.org/10.1038/ncomms4284

      Tauffenberger, A., Fiumelli, H., Almustafa, S., & Magistretti, P. J. (2019). Lactate and pyruvate promote oxidative stress resistance through hormetic ROS signaling. Cell Death Dis, 10(9), 653. https://doi.org/10.1038/s41419-019-1877-6

      Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M. P., & Morris, R. G. (2007). Schemas and memory consolidation. Science, 316(5821), 76-82. https://doi.org/10.1126/science.1135935

      Tse, D., Takeuchi, T., Kakeyama, M., Kajii, Y., Okuno, H., Tohyama, C., Bito, H., & Morris, R. G. (2011). Schema-dependent gene activation and memory encoding in neocortex. Science, 333(6044), 891-895. https://doi.org/10.1126/science.1205274

      Vardjan, N., Chowdhury, H. H., Horvat, A., Velebit, J., Malnar, M., Muhič, M., Kreft, M., Krivec, Š. G., Bobnar, S. T., Miš, K., Pirkmajer, S., Offermanns, S., Henriksen, G., Storm-Mathisen, J., Bergersen, L. H., & Zorec, R. (2018). Enhancement of Astroglial Aerobic Glycolysis by Extracellular Lactate-Mediated Increase in cAMP [Original Research]. Frontiers in Molecular Neuroscience, 11. https://doi.org/10.3389/fnmol.2018.00148

      Vezzoli, E., Cali, C., De Roo, M., Ponzoni, L., Sogne, E., Gagnon, N., Francolini, M., Braida, D., Sala, M., Muller, D., Falqui, A., & Magistretti, P. J. (2020). Ultrastructural Evidence for a Role of Astrocytes and Glycogen-Derived Lactate in Learning-Dependent Synaptic Stabilization. Cereb Cortex, 30(4), 2114-2127. https://doi.org/10.1093/cercor/bhz226

      Wang, J., Tu, J., Cao, B., Mu, L., Yang, X., Cong, M., Ramkrishnan, A. S., Chan, R. H. M., Wang, L., & Li, Y. (2017). Astrocytic l-Lactate Signaling Facilitates Amygdala-Anterior Cingulate Cortex Synchrony and Decision Making in Rats. Cell Rep, 21(9), 2407-2418. https://doi.org/10.1016/j.celrep.2017.11.012

      Yang, J., Ruchti, E., Petit, J. M., Jourdain, P., Grenningloh, G., Allaman, I., & Magistretti, P. J. (2014). Lactate promotes plasticity gene expression by potentiating NMDA signaling in neurons. Proc Natl Acad Sci U S A, 111(33), 12228-12233. https://doi.org/10.1073/pnas.1322912111

      Yao, S., Xu, M.-D., Wang, Y., Zhao, S.-T., Wang, J., Chen, G.-F., Chen, W.-B., Liu, J., Huang, G.-B., Sun, W.-J., Zhang, Y.-Y., Hou, H.-L., Li, L., & Sun, X.-D. (2023). Astrocytic lactate dehydrogenase A regulates neuronal excitability and depressive-like behaviors through lactate homeostasis in mice. Nature Communications, 14(1), 729. https://doi.org/10.1038/s41467-023-36209-5

      Yu, X., Zhang, R., Wei, C., Gao, Y., Yu, Y., Wang, L., Jiang, J., Zhang, X., Li, J., & Chen, X. (2021). MCT2 overexpression promotes recovery of cognitive function by increasing mitochondrial biogenesis in a rat model of stroke. Anim Cells Syst (Seoul), 25(2), 93-101. https://doi.org/10.1080/19768354.2021.1915379

      Zhou, Z., Okamoto, K., Onodera, J., Hiragi, T., Andoh, M., Ikawa, M., Tanaka, K. F., Ikegaya, Y., & Koyama, R. (2021). Astrocytic cAMP modulates memory via synaptic plasticity. Proc Natl Acad Sci U S A, 118(3), e2016584118. https://doi.org/10.1073/pnas.2016584118

      Zhu, J., Hu, Z., Han, X., Wang, D., Jiang, Q., Ding, J., Xiao, M., Wang, C., Lu, M., & Hu, G. (2018). Dopamine D2 receptor restricts astrocytic NLRP3 inflammasome activation via enhancing the interaction of β-arrestin2 and NLRP3. Cell Death Differ, 25(11), 2037-2049. https://doi.org/10.1038/s41418-018-0127-2

    1. Author Response

      Reviewer #2 (Public Review):

      Zou et al. presented a comprehensive study where they generated single-cell RNA profiling of 138,982 cells from 13 samples of six patients including AK, squamous cell carcinoma in situ (SCCIS), cSCC, and their matched normal tissues, covering comprehensive clinical courses of cSCC. Using bioinformatics analysis, they identified keratinocytes, CAFs, immune cells, and their subpopulations. The authors further compared signatures within subpopulations of keratinocytes along with the clinical progression, especially basal cells, and identified many interesting genes. They also further validate some of the markers in an independent cohort using IHC, followed by some knockdown experiments using cSCC cell lines.

      The strength of this study is the unique data set they have created, providing the community with invaluable resources to study and validate their findings. However, a lot of analyses were not robust enough to support the claims and conclusions in the paper. More clarification and cross-comparison with polished data are needed to further strengthen the study and claims.

      1) Stemness markers were used. The authors used COL17A1, TP63, ITGB1, and ITGA3 to represent stemness markers. However, these were not common classic stemness markers used in cSCC. What is the source claiming these genes were stemness markers in cSCC? TP63 is a master regulator and early driver event in SCC, while COL17A1, ITGB1, and ITGA3 are all ECM genes. The authors need to use commonly well-known stem cell markers in cSCC, e.g., LGR5, to mark stem-like cells.

      Thanks for raising this good point. We may not have provided a clear description of the markers COL17A1, TP63, ITGB1, and ITGA3 in the previous texts. We would like to clarify that these genes were used as the markers of epidermal stem cells in normal skin samples rather than tumor stem cells in cSCC. To avoid any possible misunderstanding, we revised the main text accordingly and added the references [4-11].

      2) Cell proportion analysis. The authors used the mean proportions to compare different clinical groups for subpopulations of keratinocytes, e.g., Figure 2B, and Figure 5B. This is not robust, as no statistics can be derived from this. For example, from Fig 2A, it is clearly shown there is a high level of heterogeneity of cellular compositions for normal samples. One cannot say which group is higher or lower simply based on mean not variance as well.

      We replotted the proportion analysis with statistics and presented the new graphs in Figure 2-figure supplement 1 for Figure 2B and Figure 5-figure supplement 1 for Figure 5B.

      3) Basal tumour cells in SCCIS and SCC. To make the findings valid, authors need to compare these cells/populations with the keratinocyte cell populations defined by Ji et al. Cell 2020. Do basal-SCCIS-tumours cells, also in SCC samples, resemble any of the population defined in Ji et al. Ji et al. also had 10 match normal, thus the authors need to validate their findings of SCC vs normal analysis using the Ji et al. dataset.

      Thanks for this valuable suggestion. We compared basal tumor cell in our study with the cell populations defined in Ji et al. Cell 2020 data using SingleCellNet [1]. The results showed that both the basal-SCCIS-tumor cells of SCCIS and basal tumor cells of cSCC in our study closely resemble the Tumor_KC_Basal subcluster defined in Ji et al’s paper (Figure 4-figure supplement 4, C and D). Tumor_KC_Basal highly expressed CCL2, CXCL14, FTH1, MT2A, which is consistent with our findings in basal tumor cells.

      4) Copy number analysis. Authors used inferCNV to perform copy number analysis using scRNA-seq data and identified CNVs in subpopulations of keratinocytes in SCCIS and SCC. To ensure these CNVs were not artefacts, were some of the CNVs identified by inferCNV well-known copy number changes previously reported in cSCC?

      In poorly-differentiated cSCC sample, the significant gains in chromosome 7, 9 and deletion in chromosome 10 were reported in previous study, indicating the reliability of the CNV analysis results (Figure 5-figure supplement 2) [12].

      5) Pseudotime analysis lines 308-313. Not sure the pseudotime analysis added much as, as it is unclear two distinct subgroups were identified from this analysis. Suggest removing this to keep it neater

      Thank you for this suggestion. We have deleted the result of pseudotime analysis.

      6) Selection of candidate genes for validation using IHC and cell line work. For example, lines 205-206, lines 352-356 and lines 437-441, authors selected several genes associated with AK and SCC to further validate using IHC and cell line knockdown work. What are the criteria for selecting those genes for validation? It is unclear to readers how these were selected. It reads like a fishing experiment, then followed by a knockdown. Clear rationale/criteria need to be elaborated.

      The first consideration of candidate gene selection is the fold change of expression. We have provided the statistical results of DEGs in Supplementary file 1b, 1h, 1j-1m. Then we selected top changed genes and conducted an extensive literature search on these genes. We prioritized genes that, although not directly associated with cSCC development, have a close relationship with related pathways, as determined through functional enrichment analysis. These genes were arranged for further verification experiments. We have added more details in main text and methods section.

      7) TME. Compared to keratinocytes populations, the investigation of TME cells was weak. (a) can authors produce UMAP files just for T cells, DC cells, and fibroblasts separately? Figure 7B is not easy to see those subclusters. (b) similar to what was done for keratinocytes, can authors find differentially expressed clusters and genes among the different clinical groups, associated with disease progression? (c) where are the myeloid cell populations, also B cells?

      Thank you for your suggestions. (a) We have added the UMAP files for T cells, DC cells and stromal cells separately in new Figure 7A. (b) We identified DEGs in TME cells among the different groups. Several key genes showed monotonically changing trends associated with disease progression. For example, with the increase of malignancy, FOS shows down-regulation while S100A8 and S100A9 monotonically increase in all three types of TME cells (Figure 7C). (c) We identified two types of myeloid cell populations, macrophage and monocyte derived DCs (MoDC). We didn’t find other myeloid cells, such as neutrophil. For B cells, there were only 28 B cells in poorly-differentiated cSCC sample, which didn’t meet the threshold for further cell-cell communication analysis.

      8) Heat shock protein genes line 327-329. HSP signature was well-known to be induced via tissue dissociation and library prep during the scRNA experiment. How could the authors be sure these were not artefacts induced by the experiment? If authors regress their gene expression against HSP gene signatures, would this cluster still be identified?

      Thank you for this valuable suggestion. It is important to note that the Basal-SCCIS-tumor cluster was identified through CNV analysis, rather than the HSP signature. To address this concern and further validate this result, “AddModuleScore” function in Seurat package was used to regress gene expression against HSP gene signatures for retrieved basal cells. Our result showed that Basal_SCCIS tumor population still can be identified after regression, even more clearly (Author response image 1).

      Author response image 1.

      The identity of Basal-SCCIS-tumor cluster considering regression against HSP signatures.

      9) Cell-cell communication analysis. The authors claimed that that cell-to-cell interaction was significantly enhanced in poorly-differentiated cSCC, and multiple interaction pathways were significantly active. How was this kind of analysis carried out? How did the authors define significance? what statistical method was used? these were all unclear. Furthermore, it is difficult to judge the robustness of the cell-cell communication analysis. Were these findings also supported by another method, such as celltalker, and cellphoneDB?

      To determine the significance of the increased overall cell-to-cell interaction strength between two groups, we utilized CellChat to obtain the communication strength in different samples. We combined the communication strength based on cell type pairs, where missing values were set to 0. We performed a paired Wilcoxon test to determine whether the enhancement of cell-to-cell interaction between samples was significant.

      For the comparison of outgoing or incoming interaction strength of the same cell types between two groups, we first extracted the communication strength of each signal pathway contributing to outgoing or incoming strength, and then merged the strengths of signal pathways among samples, where the strength of non-shared pathways with missing value was determined to be 0. Subsequently, we performed a paired Wilcoxon test to define the significance.

      For multiple groups comparisons, the Kruskal-Wallis rank sum test was first performed. If the p-value is less than 0.1, the pairwise Wilcoxon test was used for subsequent pairwise comparisons. The comparison of individual signaling pathways between groups is similar to the above. We defined p-value < 0.1 as significance threshold. We have added the significance test method in figure legend for Figure 7 and Figure 8 as well as and detailed statistical data in new Supplementary file 1q-1u.

      As suggested, we also used the approach of CellPhoneDB based on CellChatDB database to verify our cell-cell communication results. There are 55-58% of the ligand-receptor interactions predicted by CellChat were also predicted by CellPhoneDB (Author response image 2). The enhancement of cell interaction through MHC-II, Laminin and TNF signaling pathways in poorly-differentiated cSCC sample compare to normal sample were consistent in both CellChat and CellPhoneDB (Figure 8C and Figure 8-figure supplement 1B).

      Author response image 2.

      The overlap of the predicted ligand-receptor interactions between CellChat and CellPhoneDB.

      10) Statistics and significance. In general, the detail of statistics and significance was lacking throughout the paper. Authors need to specify what statistical tests were used, and the p-values. It is difficult to judge the correctness of the test, and robustness without seeing the stats.

      We have included all statistics and significance values in the figure legend and supplemental tables, and described the statistical tests in the methods section. In this revision, we have added the necessary details of statistics and significance in the main text and figures.

      11) Overall, this manuscript needs a lot of re-writing. A lot of discussion was also included in the results, making it really difficult to read overall. The authors should simplify the results sections, remove the discussion bits, and further highlight and streamline with the key results of this paper.

      Thanks a lot for this advice. We have revised the paper thoroughly, removed discussion in results section to make the manuscript easier to read.

    1. Author Response

      Reviewer #1 (Public Review):

      Zhao et al. investigated the molecular nature of the binding site for carbohydrates within the UDP-sugars known to activate the P2Y14 receptor. In order to do so, they built a molecular model of the hP2Y14, docked the corresponding agonists, and performed MD simulation on the resulting complexes. The modeling was used to identify the key molecular interactions with a cluster of charged residues in the extracellular side of the TM region of the receptor, which they show are conserved within the P2Y receptors. The binding site of the UDP region was, not surprisingly, overlapping with the analogous ADP binding site experimentally observed for the P2Y12 receptor, and consequently, the region that recognizes the sugars could be anticipated. Nevertheless, the detailed modeling and simulation work shows the consistency of this hypothesis and provides a quantification of the particular interactions involved, pinpointing specifically the residues candidate to be involved in the recognition of sugars.

      It follows the characterization, by functional assays, of the effect of single-point mutations of these residues in the efficacy of the different UDP-sugars. Here the results show a tendency to correlate with the molecular models, however some of the data has very low statistical significance and consequently the interpretation and conclusions extracted from this data should be taken with caution. This pertains to the particular role of the identified residues in the binding of the different sugars, which in some cases should be taken as a suggestion rather than a proof, though the general conclusion of the identification of the binding region for the sugar, its conservation among P2Y receptors and the role of some specific residues in sugar recognition seems convincing and the data are conveniently presented.

      Finally, the design of ADP-sugars that activate the P2Y12 receptor, based on the transferability of the observations with the UDP-sugars for the P2Y14 receptor, is a first indication that such a recognition is possible and should happen in an analogous binding region. However, the low potencies exhibited by the ADP-sugars, in the micromolar range, are too far from the ADP agonist and the relevance of this mechanism remains to be proved. The difference between P2Y12 and P2Y14, with the last one showing much higher potencies for UDP-sugar derivatives than P2Y12 for the corresponding ADP-sugars, remains an interesting question not explored in this manuscript.

      Thanks for your valuable comments. We have revised the interpretation of the data that has relatively low statistical significance in the manuscript. The conclusions extracted from this data have also been modified as suggestions. In this work, to investigate whether sugar nucleotides can also activate human P2Y12, we tested three ADP-sugars for human P2Y12. Discovery of highly potent P2Y12 agonists requires screening of a large number of compounds. It is possible there are the other ADP-sugars, which are highly potent P2Y12 agonists. It is technically challenging to synthesize ADP-sugars. Currently, we can only obtain ADP-Glc, ADP-GlcA and ADP-Man. Once the other ADP-sugars are available for us, we will test them and try to discover highly potent agonists in the future work. The highly potent agonists will be useful chemical tools to unveil the relevance mechanism of P2Y12. To explore the nature of binding site of the P2Y12 and P2Y14, we performed more experiments of mutagenesis study and added relevant data in the revised manuscript.

      Reviewer #2 (Public Review):

      The manuscript employs multiple approaches, including molecular docking, molecular dynamic simulations, and functional experiments to uncover a distinct uridine diphosphate-sugar-binding site on P2Y14 - a key drug target for inflammation and immune responses. Overall, the manuscript is clearly written, and the experimental techniques are well-documented. However, it may benefit from further analysis, particularly in terms of validating the binding pose.

      Thanks for your comments. We used MMPBSA to analyze the ligand-binding energy for each receptor residue using MD trajectories. To further characterize the ligand-binding pose, we calculated the percentage of occurrence of hydrogen binding between the ligand and the carbohydrate-binding site (K277, E278, R253 and K77). We also calculated the ligand RMSF and ligand RMSD to show the stability of the ligand-binding pose and the simulation convergence. These data have been included in the revised manuscript.

    1. Author Response

      Reviewer #3 (Public Review):

      Seeking a selective inhibitor that precisely inhibits on-target activities and avoids side effects is a major challenge in the field of drug discovery and therapeutics. The authors proposed an alternative method that combines multiple inhibitors to maximize on-target inhibition and minimize off-target inhibition. Focusing on the kinase-inhibitor interaction dataset, the authors developed a quantitative way to measure the selectivity for mixtures of inhibitors by using the Jenson-Sahannon distance metric. The method sounds technical.

      From their computation and assays, the multi-compound-multitarget scoring (MMS) method framework was validated to be able to select a combination of inhibitors that is more selective than a single highly selective inhibitor for one kinase target, or for multiple targets. The MMS method is a promising solution to reduce off-target effects and could be applicable to other inhibitor-target interactions. My suggestion is that a comparative analysis of MMS with other similar methods can be conducted to highlight the advantage of MMS over others.

      We thank the reviewer for this excellent summary and their suggestions. We agree that comparing new methods to prior ones is an important step in benchmarking new approaches and methods. However, to our knowledge, no other method exists for calculating selective combinations of kinase inhibitors. We compare our JSD selectivity scoring metric to other representative target-specific and non target-specific selectivity metrics (Figure 2 Figure Supplement 2).

      The paper is not well organized and not easily readable. For example, first, the captions of the figures are two long; some of these texts could be moved to methods or results sections. Second, the concept of "penalty distribution" or "penalty prior" is vital to understand the MMS method, thus, at least a brief definition and introduction should be put in the main text rather than supporting method, as well as the rationale to use it. Third, the method section can be divided into several subsections with clear organizations and connections. Fourth, what is the difference between "a less selective inhibitor profile" and "an even less selective inhibitor profile" in Figure 3? Overall, the details of the paper are difficult to understand in the current version. I suggest rewriting the paper in a more concise and logical style.

      We appreciate these suggestions and have significantly edited and revised our manuscript in order to facilitate clear communication. Specifically:

      1) We have added an additional description of the penalty distribution to the description of the MMS method in the main Results section of the manuscript as opposed to solely in the Materials and Methods section.

      2) We have provided a high-level concise summary of the MMS method in the results section in order to help orient a reader to the method. This description follows the same order (1 to 5) as the associated Figure 2, we hope this helps more clearly communicate the method.

      3) We have moved descriptive figure captions to the methods section and, in general, substantially reduce the size of figure captions.

      4) We have subdivided the Materials and Methods section as suggested.

      5) We now describe in our main text how the simulated profiles were generated by smoothing the PKIS2645-like profile with two restraints; non-zero activity for LS inhibitors, and similar on-target probability for PKIS2-645-like, RS, and LS inhibitors to facilitate direct comparisons. We provide a new figure to quantify the selectivity of these simulated inhibitors and their similarity with true compounds (Figure 3 Figure Supplement 1).

      6) We have removed content from the introduction and results sections that was less important to communicate to a general audience in order to make the manuscript more concise. We have also removed or condensed extraneous supplemental figures that were not required to communicate the central results and findings of experiments (ex: supplemental figures for Figure 3 and Figure 4 from the prior submission).

    1. Author Response

      Joint Public Review

      (1) The developed model considers the interaction of multiple signaling networks that are essential for morphogenesis and homeostasis in the intestinal tissue, as well as other elements that had been proposed as relevant in the literature. Nevertheless, the details of how these interactions are modeled couldn't be evaluated in the current revision as the model was not shared with the reviewers and it is not available yet online, nor specified in any detail in the current manuscript. Additionally, how quantitative information from Wnt and BMP signaling pathways is incorporated in a quantitative way in the model is not clear.

      Model files are provided with this reply. These are ‘.jl’ files for use with Julia. The model (the files provided with this reply) will be freely publicly available through BioModels upon acceptance of this manuscript for publication.

      The model includes abstracted values to reproduce Wnt and BMP signalling gradients and their effect on cell proliferation and differentiation to generate the three-dimensional crypt spatial cell distribution. To further clarify the implementation of the quantitative information from Wnt and BMP signalling pathways in the model, we have added the following paragraph in the Appendix Section 8) Cell fate: proliferation, differentiation, arrest, apoptosis

      "…During this migration the Wnt content in absorptive progenitors is halved in each division and, away from Wnt sources, progressively decreases, while BMP signals increase, towards the villus. In our model, differentiation into enterocytes occurs when progenitors encounter a BMP signal level, higher that their Wnt signal content. For instance, in the ileal crypt in homeostasis this occurs approximately at cell position 16 from the crypt base, where progenitors migrating from the stem cell niche reach a reduced content of Wnt signals of about 8 a.u. On the other hand, the BMP signalling level has a maximum value of 64 at approximately cell position 23 from the crypt base, where BMP signals are generated by mature enterocytes. These BMP signals diffuse towards the crypt base and, hence, decrease exponentially to reach values of 8 a.u. at approximately position 16, which, hence, enable differentiation into enterocytes. Epithelial injuries resulting in a decreased number of enterocytes reduce BMP signal production and its diffusion range which results in the enlargement of the proliferation compartment as cells encounter the required level of BMP signals for differentiation only at higher positions in the crypt."

      (2) Some conclusions by the authors are not properly justified in the text, as "Paneth cells are the main driver behind the differential mechanical environment in the niche", "Wnt-mediated feedback loop prevents the uncontrolled expansion of the niche", the specific effect of p27 in contrast with Wee1 phosphorylation over the cell cycle length, and "their recovery [absorptive progenitors] started before the end of the treatment, driven by a negative feedback loop from mature enterocytes to their progenitors".

      We have reworded these statements as described below.

      The paragraph “Paneth cells are the main driver behind the differential mechanical environment in the niche, where cells with longer cycles accumulate more Wnt and Notch signals. In agreement with experimental reports {Pin, 2015 #719}, in our model Paneth cells are assumed to be stiffer and larger than other epithelial cells, requiring higher forces to be displaced and generating high intercellular pressure in the region” has been modified and now reads as follows “In agreement with experimental reports {Pin, 2015 #719}, Paneth cells are assumed to be stiffer and larger than other epithelial cells, requiring higher forces to be displaced and generating high intercellular pressure in the niche. Due to this increased mechanical pressure, cells in the niche have longer division cycles and can accumulate more Wnt and Notch signals.”

      The sentence “Wnt-mediated feedback loop prevents the uncontrolled expansion of the niche” has been deleted from paragraph, that now reads “To generate a niche of stable size, we implemented a negative Wnt-mediated feedback loop that resembles the reported stem cell production of RNF43/ZNRF3 ligands to increase the turnover of Wnt receptors in nearby cells {Hao, 2012 #2086;Koo, 2012 #2089;Clevers, 2013 #538;Clevers, 2013 #2098}. Similarly, in our model, a number of stem cells in excess of the homeostatic value reduces cell tethering of Wnt ligands and hence inhibits Paneth and stem cell generation (Figures 1A-B).”

      Regarding the specific effect of p27 in contrast with Wee1 phosphorylation over the cell cycle length. We have simplified the text in the main manuscript that now reads “Using the model of Csikasz-Nagy et al. {Csikasz-Nagy, 2006 #1870}, we modulated the duration of G1 through the production rate of the p27 protein. The p27 protein has been reported to regulate the duration of G1 by preventing the activation of Cyclin E-Cdk2 which induces DNA replication and the beginning of S-phase {Morgan, 2007 #2073}. We, hence, hypothesized that rapid cycling absorptive progenitors located in regions of low mechanical pressure outside the stem cell niche have low levels of p27, which bring forward the start of S-phase to shorten G1 (Figures 2D). In support of this hypothesis, it has been demonstrated that p27 inhibition has no effect on the proliferation of absorptive progenitors {Zheng, 2008 #2074} (see the Appendix for a full description).

      In the Appendix Section 2 we provide an extended explanation of the use of the p27 and Wee1 kinetic governing parameters to decrease the length of the cell cycle by decreasing mainly G1 but maintaining the length of S phase constant, which is as follows

      "Regarding G1 phase, the p27 protein has been reported to regulate the duration of G1 by preventing the activation of Cyclin E-Cdk2 which induces DNA replication and defines the beginning of S-phase {Morgan, 2007 #2073}. We hypothesized that fast cycling cells have low levels of p27 which result in earlier DNA replication, bringing forward the start of S-phase and shortening the length of G1. In support of this hypothesis, it has been experimentally demonstrated that inhibiting p27 has no effect on the proliferation of absorptive progenitors {Zheng, 2008 #2074}. In the Csikasz-Nagy model {Csikasz-Nagy, 2006 #1870}, the duration of G1 can be modulated through the parameter V_si, which is the basal production rate of p21/p27 (in the Csikasz-Nagy model, the p21 and p27 proteins are represented by a single variable, here we refer to that model quantity as p21/p27).

      Additionally, the end of S-phase is associated with the decrease of Wee1 to basal levels due to Cdc14 mediated phosphorylation of Wee1. In the Csikasz-Nagy model {Csikasz-Nagy, 2006 #1870}, this reaction is described by a Goldbeter-Koshland function, which includes the parameter KA_Wee1p to regulate the level of Cdc14 required for the phosphorylation of Wee1.

      Therefore, we modified these two parameters, V_si and KA_Wee1p, to ensure that variations of the cycle duration mostly impact on G1 while the length of S phase remains constant. We assumed that the value of the two parameters scales linearly with the duration of the division cycle, t_cycle, between a lower and upper bound, which prevent aberrant behaviour of the cell cycle model in the dynamically changing conditions of the crypt."

      The paragraph related to “their recovery started before the end of the treatment…” sentence has been amended in the text and now reads “Simulated proliferative absorptive progenitors were indirectly affected by stem cell ablation and their decrease was followed by a reduction in mature enterocytes. The progenitors recovered soon after treatment interruption to later reach values above baseline when responding to the negative feedback signalling from mature enterocytes (Figure 3A).”

      (3) Only the results of the "main" model are shown, with no information about its sensitivity to parameter values, and how their conclusions depend on specific decisions on the model. For example, the authors said that "an optimal crypt cell composition is achieved when BMP and Wnt differentiation thresholds result in progenitors dividing approximately four times before differentiating into enterocytes", but the results of alternative scenarios are not shown.

      To address this comment, we have included a new section in the Appendix, called “What-if Analysis”, and new figures (Figure S4-S8) with simulations of alternative scenarios affecting the main signalling pathways that govern crypt composition, in particular, we simulated stronger and weaker Wnt, BMP, Notch and ZNRF3/RNF43 signalling.

      We attach the new section here:

      "10) What-if Analysis

      We investigated the effect on the simulated crypt of increasing and decreasing the strength of the main signalling pathways, Wnt, BMP and ZNRF3/RNF43 signalling, and modifying the Notch thresholds. For each alternative parameterisation, except when decreasing ZNRF3/RNF43 signalling, the simulation was run for 30 days to ensure stability was reached with the new parameter set and the final 10 days were included in the analysis. When decreasing ZNRF3/RNF43 signalling, we simulated 60 days to demonstrate the expansion of the niche and analysed the final 10 days. The reference parameter set used as baseline was the ileal mouse crypt parameter set reported in Appendix Table 1. In all cases, we only consider modifications of one signalling mechanism at a time.

      To study alternative Wnt signalling scenarios, we used the WntRange parameter (Appendix Table 1), to double and halve the spreading area of Wnt signals emitted by Paneth cells while we maintained the original WntRange value for Wnt-emitting mesenchymal cells at the bottom of the crypt (Appendix Section 7.1) (Figures S4A-S4F). When WntRange was doubled, we observed increased number of stem and Paneth cells in a noticeably enlarged niche (Figures S4C-S4D), with cells choosing the stem cell fate instead of differentiating into absorptive progenitors. On the other hand, decreasing Wnt signalling, by halving WntRange in Paneth cells but maintaining its homeostatic value in mesenchymal cells, resulted in no apparent changes in the niche cell composition (Figures S4E-S4F) which resembled published experimental results of persisting functional stem cells after Paneth cell ablation {Durand, 2012 #434}.

      The ZNRF3/RNF43-mediated negative feedback mechanism regulates the size of the niche by modulating Wnt signalling. We simulated increasing and decreasing the strength of the ZNRF3/RNF43, by doubling and halving, respectively, the parameter Z described in the Appendix Section 7.2 (Figures S5A-S5F). Following the increase of the intensity of ZNRF3/RNF43 signalling, we observed a decrease in the number of stem and Paneth cells together with relatively minor changes in the transit-amplifying region (Figures S5C-S5D). On the other hand, when decreasing ZNRF3/RNF43 signalling levels, the niche expanded , resulting in a crypt dominated by Paneth and stem cells (Figures S5E-S5F ) which replicates reported experimental phenotypes {Koo, 2012 #2089}.

      To modify Notch signalling, we increased and decreased by 1 A.U. the Notch threshold required for lateral inhibition (Figures S6A-S6F). This Notch signalling threshold determines the number of contacting Notch-secreting cells (secretory lineage) to inhibit the differentiation of stem cells into the secretory lineage. Thus, increasing this Notch threshold enhances the production of secretory cells leading to the increase of Paneth, goblet and enteroendocrine cells (Figure S6C-S6D). Alternatively, decreasing the Notch threshold enhances differentiation into the absorptive lineage, reducing the number of Paneth and secretory cells (Figure S6E-S6F).

      We modified the range of diffusion of BMP signals by doubling and halving the parameter A , (Figures S7A-S7F) which denotes the amount of diffusing BMP signals towards the base of the crypt (Appendix Section 7.4). When we increased the BMP signalling range, enterocytes differentiated at lower crypt positions effectively reducing the transit-amplifying zone (Figure S7A, Figure S7B). Decreasing BMP signalling strength by halving A resulted in the increase of proliferative absorptive progenitors, which reach higher positions in the crypt (Figure S7C-S7D). The niche was largely unaffected in both cases (Figure S7E-S7F)."

      (4) Regarding the construction of the model, the authors used "counts of Ki-67 positive cells recorded by position" while the original data reported "overall cell counts per crypt and villus". Some explanation about how this conversion was made, why it is valid, as well as any potential problems, is needed. Additionally, the model is based on experiments done by others in mouse models; the similarity to the response in human intestinal crypts is not discussed.

      Ki-67 immunostaining data during 5-FU treatment was derived from the same experiments. The overall cell counts per crypt and villus are published in {Jardi, 2022 #2416}. For this manuscript, we reanalysed the intestinal samples to estimate counts of cell types by position in the crypt.

      We have clarified the text, which now reads …“The samples from this later study {Jardi, 2022 #2416} were analysed again to count Ki-67 positive cells at each position along the longitudinal crypt axis, for 30-50 individual hemi crypt units per tissue section per mouse as previously described {Williams, 2016 #2165}.”

      We agree that the understanding of the translation of results derived from animal models into a human or clinical context is of high relevance. The mouse crypt is a model of choice to study epithelial biology and exhibits remarkable similarities with the human crypt. In our team, we are focussed on developing translational modelling strategies and have a version of the model that describes a human crypt. That model assumes mostly conserved crypt biology and structure across species and includes changes in parameter values needed to compensate reported differences in morphometrics and cell cycle duration. Due to the relevance and extent of this translational work, we chose to focus on the mouse crypt entirely in this manuscript. We think the translational modelling strategy to explore the quantitative translation between human and mouse and/or other species/settings merits a full report.

      (5) The authors imply that their mathematical model of the intestinal crypt is an improvement over those already published but there is no direct comparison or review of the literature to substantiate this claim.

      An extended literature review including more details of previous ABMs to enable a direct comparison with our model is now included in the manuscript and reads as follows:

      “Several agent-based models (ABMs) have been proposed to describe the complexity and dynamic nature of the intestinal crypt. Early models were used as in silico platforms to study the dynamics and cellular organisation of the crypt. For instance, one of the pioneering ABMs was used to study the distribution and organisation of labelling and mitotic indices {Meineke, 2001 #326}. This model comprises a fixed ring of Paneth cells beneath a row of stem cells, which divide asymmetrically to produce a stem cell and a transit-amplifying cell that terminally differentiates after a fixed number of divisions. Some subsequent models are lattice-free, recapitulate neutral drift of equipotent stem cells and describe proliferation and cell fate regulated by a fixed Wnt signalling spatial gradient, which is defined by the distance from the crypt base, with proliferating cells progressing through discrete phases of the cell cycle and showing variable duration of the G1 phase {Pitt-Francis, 2009 #129}. Further model refinements can be seen in the model of Buske et al (2011), with stochastic cell growth and division time {Buske, 2011 #1}, Wnt levels defined by the fixed local curvature of the crypt and lateral inhibition driven by Notch signalling. Here, we present a lattice-free agent-based model that describes the spatiotemporal dynamics of single cells in the small intestinal crypt driven by the interaction of surface tethered Wnt signals, cell-cell Notch signalling, BMP diffusive signals, RNF43/ZNRF3-mediated feedback mechanisms and the cycle protein network responding to the crypt mechanical environment. We show that our computational model enables the simulation of the ablation and recovery of the stem cell niche as well as of how drug-induced molecular perturbations trigger a cascade of disruptive events spanning from the cell cycle to single cell arrest and/or apoptosis, altered cell migration and turnover and ultimately loss of epithelial integrity.”

      (6) The authors claim that the simulated data and the available mouse data match up. Nevertheless, the data vs the model still appear both quantitatively and qualitatively different (as presented in Figures 2E, F, and 5C, D). This puts in doubt how much the model can actually reproduce the experimental data. In conclusion, the model would benefit from further refinement, particularly if the goal is to use the model for predicting the dynamics of oncogenic drug candidates.

      To address this comment, we have made several adjustments: we refined the counting algorithm that determines cell position and improved the Ki67 and BrdU staining simulations by modifying the simulated staining criteria and adding an estimation of the experimental error to the simulated responses. A description of these changes is described in a new section in the appendix called “ABM simulation of Ki-67 and BrdU Staining”

      With these changes we think we have achieved a more satisfactory agreement between observed and predicted results and updated all figures with Ki67 and BrdU staining simulated results.

    1. Author Response

      We are grateful to the editors and the reviewers for the thorough evaluation of our manuscript and their feedback, as it allows us to provide additional clarification of our findings and improve the manuscript.

      In their evaluation reviewers raised a key conceptual point linked to the inhibitory mechanism that appeared to be insufficiently explained in the manuscript, leading to a misconception regarding the physiological relevance. They have also missed experimental data related to the concentrations of Aβ used and their relevance for Alzheimer’s disease (AD). We believe that our studies, although performed in vitro in model systems, provide novel conceptual framework and shed light on the unexplored mechanisms underlying AD.

      We discuss these points below in a provisional response to their comments.

      Reviewer #1 (Public Review):

      Summary:

      Human Abeta42 inhibits gamma-secretase activity in biochemical assays.

      Strengths:

      Determination of inhibitory concentration human Abeta42 on gamma-secretase activity in biochemical assays.

      Weaknesses:

      Human Abeta42 may concentrate up to microM order in endosomes.

      This is correct.

      If so, production of Abeta42 would be attenuated then lead to less Abeta deposition in the brain. The authors finding is interesting but does not fit the physiological condition in the brain.

      We thank the reviewer for raising this key conceptual point, as this gives us the opportunity to clarify it for the future readers.

      The characterized inhibitory mechanism is more complex than the reviewer’s interpretation, and a number of factors must be considered. Indeed, our data show that Aβ42 upon intracellular concentration inhibits γ-secretase activity, resulting in increased γ-secretase substrate (C-terminal fragment, CTF) levels. It is important however to highlight that this inhibition is competitive in nature, implying that it is partial, reversible, and regulated by the relative concentrations of the Aβ42 peptide (inhibitor) and the substrates. The model that we put forward is that cellular uptake and intracellular concentration of Aβ42 facilitates γ-secretase inhibition, which results in the accumulation of APP-CTFs (and γ-secretase substrates in general). However, as Aβ42 levels fall, the increased concentration of substrates shifts the equilibrium towards their processing and Aβ production. As Aβ42 concentration raises again, equilibrium is shifted back towards inhibition and so on. This inhibitory mechanism will translate into pulses of (partial) γ-secretase inhibition, which will alter γ-secretase mediated signalling (arising from increased CTF levels or decreased release of soluble intracellular domains from substrates). These alterations may affect the dynamics of systems oscillating in the brain, such as NOTCH signalling, implicated in memory formation (2), and potentially others (related to e.g. cadherins, p75 or neuregulins).

      It is worth noting that oscillations in γ-secretase activity induced by treatment with a γ-secretase inhibitor (semagacestat) have been proposed to have contributed to the cognitive alterations observed in semagacestat treated patients in the failed Phase-3 IDENTITY clinical trial (2, 3); and that semagacestat, like Aβ42, acts as a high affinity competitor of substrates (Koch et al, 2023). We will include this clarification in the discussion of the revised manuscript and create an additional figure presenting the proposed mechanism.

      It is not clear whether the FRET-based assay in living cells really reflect gamma-secretase activity.

      The specificity of this assay is supported by the γ-secretase inhibitor treatment included as a positive control (Figure 3). In addition, the following literature supports that this assay truthfully assesses γ-secretase activity in cellular context (4-7).

      Processing of APP-CTF in living cells is not only the cleavage by gamma-secretase.

      This is correct, and therefore we have analysed the contribution of other APP-CTF degradation pathways by performing cycloheximide-based stability assay in the presence of γ-secretase inhibitor. Quantitative analysis of the levels of both APP-CTFs and APP-FL over the 5h time-course failed to reveal significant differences between Aβ42 treated cells and controls. As expected, Bafilomycin A1 treatment markedly prolonged the half-life of both proteins (Figure 7B & C). The lack of a significant impact of Aβ42 on the half-life of APP-CTFs under the conditions of γ-secretase inhibition is consistent with the proposed inhibitory mechanism. Finally, we note that the inhibition will not only affect APP-CTF, but also the processing of γ-secretase substrates in general.

      Reviewer #2 (Public Review):

      Summary:

      In the current study, the authors tested the hypothesis that Aβ42 toxicity arises from its proven affinity for γ-secretases. Specifically, the increases in Aβ42, particularly in the endolysosomal compartment, promote the establishment of a product feedback inhibitory mechanism on γ-secretases, and thereby impair downstream signaling events. They showed that human Aβ42 peptides, but neither murine Aβ42 nor human Aβ17-42 (p3), inhibit γ-secretases and trigger accumulation of unprocessed substrates in neurons, including (CTFs of APP, p75 and pan-cadherin. Moreover, Aβ42 dysregulated cellular homeostasis by inducing p75-dependent neuronal death. Because γ-secretases process many other membrane proteins, including NOTCH, ERB-B2 receptor tyrosine kinase 4 (ERBB4), N-cadherin (NCAD) and p75 neurotrophin receptor (p75-NTR), revealing a broad range of downstream signaling pathways, including those critical for neuronal structure and function. Hence, they propose to identification of a selective role for the Aβ42 peptide, and raise the intriguing possibility that compromised γ-secretase activity against the CTFs of APP and/or other neuronal substrates contributes to the pathogenesis of AD. Overall, the data are not very convincing to support the main claim.

      Strengths.

      Different in vitro and cellular approaches are employed to test the hypothesis.

      Weaknesses.

      The experimental concentrations for Aβ42 peptide in the assay are too high, which are far beyond the physiological concentrations or pathological levels. The artificial observations are not supported by any in vivo experimental evidence.

      It is correct that in the majority of the experiments we used low μM concentrations of Aβ42. However, we would like to note that we also performed experiments where conditioned medium collected from human APP.Swe expressing neurons was used as a source of Aβ. In these experiments total Aβ concentration was in low nM range (0.5-1 nM) (Figure 4G). Treatment with this conditioned medium led to the increase APP-CTF levels, supporting that low nM concentrations of Aβ are sufficient for partial inhibition of γ-secretase.

      We would like to underline that Aβ is estimated to be present in the brain in concentration ranging from fM to mM, depending on the pool (soluble, aggregated, fibrillar, etc) that is considered (8, 9). However, it is rather the local than the global concentration of Aβ that is critical for the disease pathogenesis. In this regard, it is proposed that as AD progresses Aβ42 slowly accumulates in the endo-lysosomal system wherein it reaches μM concentrations that are required for aggregation and seeding (1, 10, 11). Our findings are consistent with the analysis showing that extracellular soluble Aβ42 peptide, at low nM concentrations, is taken up by cortical neurons and neuroblastoma (SH-SY5Y) cells, and concentrated in the endo-lysosomal system wherein effective peptide concentrations reach ~2.5 μM (1). Hence, a slow vesicular peptide accumulation and/or degradation imbalance (1, 11, 12) could lead to several order of magnitude increases in the effective concentration of Aβ42 over the span of years to decades in AD pathogenesis. We note that our experimental settings, using low μM concentrations of extracellular Aβ42 over 24h treatment, were designed to accelerate this 'peptide concentration’ process in vitro. As discussed in our report, a high μM Aβ peptide concentration in the endo-lysosomal system not only leads to aggregation but also facilitates γ-secretase inhibition. Of note, we are currently developing protocols and will undertake follow up studies to quantitatively define the Aβ concentration in synaptosomes and endosomes in AD brain, as well as in in vitro systems (i.e. cells treated with Aβ preparations obtained from AD brains).

      Finally, we would like to highlight that analyses of the brains of the AD affected individuals have shown that APP-CTFs accumulate in both sporadic and genetic forms of the disease (13-15); and recently, Ferrer-Raventós et al have revealed a correlation between APP-CTFs and Aβ levels at the synapse (13).

      To conclude, we would like to highlight that as clarified above, the Aβ peptide concentrations and the conditions tested fit well within pathophysiology, and that the data presented in our report collectively provide evidence in support of an Aβ42-mediated inhibitory effect on γ-secretase.

      References:

      1. X. Hu et al., Amyloid seeds formed by cellular uptake, concentration, and aggregation of the amyloid-beta peptide. Proc Natl Acad Sci U S A 106, 20324-20329 (2009).
      2. B. De Strooper, Lessons from a failed γ-secretase Alzheimer trial. Cell 159, 721-726 (2014).
      3. R. S. Doody et al., A phase 3 trial of semagacestat for treatment of Alzheimer's disease. N Engl J Med 369, 341-350 (2013).
      4. M. C. Houser et al., A Novel NIR-FRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel) 20, (2020).
      5. M. C. Q. Houser et al., Limited Substrate Specificity of PS/γ-Secretase Is Supported by Novel Multiplexed FRET Analysis in Live Cells. Biosensors (Basel) 11, (2021).
      6. M. Maesako et al., Visualization of PS/γ-Secretase Activity in Living Cells. iScience 23, 101139 (2020).
      7. M. Maesako, M. C. Q. Houser, Y. Turchyna, M. S. Wolfe, O. Berezovska, Presenilin/γ-Secretase Activity Is Located in Acidic Compartments of Live Neurons. J Neurosci 42, 145-154 (2022).
      8. B. R. Roberts et al., Biochemically-defined pools of amyloid-β in sporadic Alzheimer's disease: correlation with amyloid PET. Brain 140, 1486-1498 (2017).
      9. J. A. Raskatov, What Is the "Relevant" Amyloid β42 Concentration? Chembiochem 20, 1725-1726 (2019).
      10. M. P. Schützmann et al., Endo-lysosomal Aβ concentration and pH trigger formation of Aβ oligomers that potently induce Tau missorting. Nat Commun 12, 4634 (2021).
      11. E. Wesén, G. D. M. Jeffries, M. Matson Dzebo, E. K. Esbjörner, Endocytic uptake of monomeric amyloid-β peptides is clathrin- and dynamin-independent and results in selective accumulation of Aβ(1-42) compared to Aβ(1-40). Sci Rep 7, 2021 (2017).
      12. M. F. Knauer, B. Soreghan, D. Burdick, J. Kosmoski, C. G. Glabe, Intracellular accumulation and resistance to degradation of the Alzheimer amyloid A4/beta protein. Proc Natl Acad Sci U S A 89, 7437-7441 (1992).
      13. P. Ferrer-Raventós et al., Amyloid precursor protein Neuropathol Appl Neurobiol 49, e12879 (2023).
      14. M. Pera et al., Distinct patterns of APP processing in the CNS in autosomal-dominant and sporadic Alzheimer disease. Acta Neuropathol 125, 201-213 (2013).
      15. L. Vaillant-Beuchot et al., Accumulation of amyloid precursor protein C-terminal fragments triggers mitochondrial structure, function, and mitophagy defects in Alzheimer's disease models and human brains. Acta Neuropathol 141, 39-65 (2021).
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      As written in my public review I consider the science of this work to be high quality. I have some suggestions for the write-up though. As a general comment, I think that too much has been put into the appendices. In particular, the main text could contain more details about the model.

      We are pleased that this Reviewer feels that our work to be of “high quality”. We value the reviewer’s insightful suggestions and comments. Following this Reviewer’s suggestion we have moved certain sections to the main text.

      In what follows, we provide responses to each of the reviewer’s inquiry, and indicate the appropriate changes in the revised version.

      P2 -

      ϕ is introduce as packing fraction - on p3 it’s called cell density. Also it is not clear whether it is an area fraction or a cell number density. Please define properly and I would suggest sticking to one notion.

      ϕ is the cell packing fraction. In two dimensions (as is the case in our simulations) it is the area fraction. However, in order to stick to one general notation (independent of dimension) we use “packing fraction” to represent how densely the cells are packed. We changed it the revised manuscript to ensure uniformity.

      P3 -

      “which should and should slow down the overall dynamics” Typo?

      Corrected it in the revised manuscript.

      “One would intuitively expect that the ϕfree should decrease with increasing cell density” Please, define ϕfree

      ϕfree is defined in Eqn. 4. We ought to have defined it in the introduction.

      “When ϕ exceeds ϕS, the free area ϕfree saturates because the soft cells interpenetrate each other,” I suggest clearly distinguishing between biological cells and the agents (disks) used in the simulation. Please, also clarify What interpenetration of agents corresponds to in tissues?

      We have rewritten the sentence as, ”The simulations show that when..” Soft disks used in the simulations seem to be not an unrealistic model for biological cells. The small deformations noted in our model is not that different from the cells in the tissues. For visual reference, please see Author response image 1. In the left panel of the figure, a 2D snapshot of the experimental zebrafish tissue, displays the deformation of cells labeled as 1 and 2. Likewise, the right panel illustrates the extent to which such deformations are replicated in the simulation by allowing two cells to overlap (the white area in the right panel of Author response image 1 represents the interpenetration). In the revised manuscript, we have made the necessary change from “soft cells” to “soft disks.”

      Author response image 1.

      Snapshots of zebrafish tissue (left panel) (Ref. [14] main text) and model two dimensional tissue (right). In the right panel the white area represents the overlap and the black vertical line represents the intersection.

      “The facilitation mechanism, invoked in glassy systems [22] allows large cells to move with low mobility.” What is the facilitation mechanism?

      Facilitation, which is an intuitive idea, that refers to a mechanism by which cells in a in highly jammed environment can only move if the neighboring cells get out of the way. In our case (as shown in the text (Fig.3 (A) and Fig. 13 (A) & (B)) the smaller cells move faster almost independent of ϕ. When a small cell moves, it creates a void which could facilitate neighboring cells (including big ones) to move.

      “η (or relaxation time)” I suggest explaining the link between η and the relaxation time.

      First, in making this point on aging we only showed that the relaxation time is independent of the waiting time. In the revised manuscript we deleted η.

      Although not germane to this study, in the literature on glass transition, it is not uncommon to use relaxation time τα (as a proxy of viscosity η) to describe the dynamics. The relation between τα and η is given by

      where G∞ is the “infinite frequency” shear modulus, which holds in unjammed or in liquids. This relation suggests that τα is proportional to η, which is almost never satisfied in glass forming systems.

      P5 - “In addition, the elastic forces characterizing cell-cell interactions are soft, which implies that the cells can penetrate with rij − (Ri + Rj) < 0 when they are jammed.” Is this about the model or the biological tissue? Presumably the former, because real cells do not penetrate each other, right? What are rij, Ri and Rj?

      This is about the model. The cells are sufficiently soft that they can be deformed, which allows for modest interpenetration. Real cells exhibit similar behavior (see Fig. 1). In inset of Fig. 4 (b) rij is the center to center distance between cells with radii Ri and Rj. It is better to use the word overlap instead of penetrate, which is what we have done in the revised version.

      “we simulated a highly polydisperse system (PDs) in which the cell sizes vary by a factor of ∼ 8” Is it important to have a factor 8 - the zebra fish tissue presents a factor 5 − 6?

      This is an important question, which is difficult to answer using analytic theory. It does require simulations unfortunately. We do not know a priori the polysipersity value needed to observe saturation in η at high value of ϕ. However, we have shown that the a system with one type of cell (monodisperse) crystallizes. Furthermore, mixtures of two cell types do not show any saturation in η over the parameter range that we explored. A systematic simulation study is needed to explore a range of parameter values to determine the minimum PD, which would match the experimental findings.

      We performed 3D simulations to figure out if much less PD would yield saturation in η. Preliminary simulations in three dimensions with a lower value of PD (11.5% with a size variations by a factor of ≈ 2 ) exhibits saturation in the relaxation time. For comparison, the value of PD in the current work is ≈ 24% with a size variation by a factor of 8.

      P6 -

      “which is related to the Doolittle equation [26] for fluidity ( )” what is the Doolittle equation? Is it important here? Also: “VFT equation for cells”? Is it the same as given on p.2 - so nothing special for cells - or a different one?

      Historically, the Doolittle equation was proposed to describe the change in η in terms of free volume in the context polymer systems over 60 years ago. The physics in the polymers is very different from the soft models for cells considered here. Nevertheless, the equations has meaning in the context as well. The Doolittle (other names associated with similar equations are Ferry, Flory... ) equation is given by

      , where A and B are constants, V is the total volume and Vhc is the hardcore volume. Essentially, is the relative free volume. It can be shown that one can arrive at the VFT equation starting from the Doolittle equation.

      The VFT equation for cells is same as given in page 2, which we restate for completeness. Here, we introduce the apparent activation energy.

      “The stress-stress tensor” Why not simply stress tensor?

      We have corrected it.

      “shows qualitatively the same behavior as the estimate of viscosity (using dimensional arguments) made in experiments.” Where is this shown?

      The dependence of viscosity as a function ϕ is shown in Figure 1 (c).

      P7 -

      Fig 2A caption “dashed line” Maybe full line?

      This should be full line. It is fixed in in the revised manuscript.

      P8 -

      “a puzzling finding that is also reflected” Why is it puzzling?

      In figure 2 (C), it shows that the increase in the duration in the plateau of Fs(q,t) ceases when ϕ exceeds ≈ 0.90. This to us is puzzling (always a matter of perspective) because we expected that the duration of Fs(q,t) plateau to increase as a function of ϕ based on the VFT behavior for ϕ ≤ ϕS. As a result, we imagined that the relaxation time τα would continue to increase beyond ϕS. However, the simulations show that the relaxation time is essentially a constant for ϕ > 0.90, which implies that the soft disk system (our model for the tissue) is an unusual with behavior that has no counter part in the material world.

      “If the VFT relation continues” –“If the VFT relation continued”

      We have fixed it.

      First paragraph does not seem to be coherent

      What is RS (or Rs)?

      RS is the radius of the small cell. In the revised manuscript we have made this clear.

      P10 -

      Please, define the waiting time.

      The waiting time refers to the period between sample preparation and data collection either in experiments or in simulations. In an ergodic system, the properties should not depend on the waiting time provided provided it is large. In other words, after the system reaches thermal equilibrium, the waiting time tω should not have an impact on the properties of the system.

      “fully jammed” Please, define.

      The term “fully jammed” refers to a state in which the constituent particles in a system do not move. For example, it a hard sphere system at a packing fraction of approximately 0.84 is fully jammed, which implies there is wiggle room for a particle move without violating the excluded volume restriction. At this specific packing fraction, the hard sphere system undergoes a jamming transition, resulting in the particles becoming completely immobile. The nonconfluent tissue modeled here is not fully jammed.

      P11 -

      Fig.4 it is hard to see that the width of P(hij) increases with ϕ.

      Please see Author response image 2 with a less number of curves for a better visualization. We have replaced this figure in the revised version.

      Author response image 2.

      Probability of overlap (hij) between two cells, P(hij), for various ϕ values.

      “Thus, even if the cells are highly jammed at ϕ ≈ ϕS, free area is available because of an increase in the overlap between cells.” This conclusion seems premature at this point.

      The Referee is correct. This is shown in Fig. 5. We amended the ends of the sentence to reflect this observation.

      P12 -

      “as is the case when the extent of compression increases” extent of compression = density?

      This is correct. Extent of compression corresponds to the packing fraction or the density.

      “This effect is expected to occur with high probability at ϕS and beyond,” Why? What is special about ϕS.

      To achieve high packing fractions beyond a certain value of ϕ soft cells have, which would occur at a certain value ϕS. In the system studied here, ϕ ≈ 0.90 = ϕS. Note that ϕS could be altered by changing the system parameters.

      P15 -

      “local equilibrium” In a thermodynamic sense? There is also cell migration, so thermodynamic equilibrium does not seem to be appropriate.

      This is an important point. The observation that equilibrium concepts hold in what is manifestly a non-equilibrium system is a surprise. It is referred in a thermodynamic sense. We agree with the reviewer because of cell division (in Ref. [14] main text), cell death, thermodynamic equilibrium does not seems to be appropriate. This is exactly the point we raise in the introduction. However, considering the timescale of cell division and death it appears that there may be a local steady state, which we we call a “local equilibrium”. As a consequence phase transition ideas and Green-Kubo relations are applicable. Indeed, a surprise in the conclusion in Ref. [14] is that in the zebrafish morphogenesis equilibrium description seems adequate.

      “number of near neighbor cells that is in contact with the ith cell. The jth cell is the nearest neighbor of the ith cell, if hij > 0” A neighbour cell or the nearest neihbor?

      A neighbour cell is accurate.

      P16 -

      “In our model there is no dynamics with only systematic forces because the temperature is zero.” What is a systematic force? I do not understand the sentence.

      Systematic force between two cells is defined in Eqn. 5 in the main text. Because temperature is not a relevant variable in our model, we want to emphasize that in the absence of self propulsion, the cells would not move at all.

      Reviewer #2

      Major comments:

      A/ Role of size polydispersity

      In the text, and also in the methods (Appendix A), the authors mention that they need large polydispersity of particle sizes to explain the viscous plateau, as the dynamics of small vs large cells are ”dramatically different” (Appendix G). They simulate a system where cell sizes vary by a factor 8, mentioning this is typical in tissues, but I found this quite surprising - this would be heterogeneities in cell volume of 500, many orders of magnitude above what has been measured in tissues. As far as I’m aware, divisions are quite symmetric and synchronous in early vertebrate embryogenesis, so volume variations are expected to be very small (similarly in epithelial tissues, where jamming has been looked at extensively, I’m not aware of examples with ratio of 8 between cell diameters). One question I had is that when the authors look at ”small polydispersity”, there are 50 − 50 mixtures. Would small polydispersity with continuous distributions change this picture? Could they take their current simulations but smoothly change the ratio of polydispersity from 8 to 0 to see exactly how much they need to explain viscosity plateauing, and at which point is the transition?

      We thank the reviewer for raising this important question, which was also a concern for Reviewer #1. The value of polydispersity (PD) required to observe such behavior is not known a priori even within the simple model used. We selected a PD value, with a size variation of a factor of 8, guided in part by the experiment (projection onto 2D) shown in Figure 1(B) and Figure 6(D). We also showed that the monodisperse system crystallizes, and the binary system do not show signs of saturation within the explored range of parameter space and ϕ. This suggests that a certain degree of size dispersity is necessary to obtain saturation in η.

      As discussed in Appendix B, the binary system is characterized by the variables , where RB and RS represent the radii of the big and small cells, respectively, and the packing fraction ϕ. By more fully exploring the parameter space encompassing λ and ϕ than we did, it maybe possible, as the Referee suggests, that a system with two different cell sizes would yield the experimentally observed dependence of η on ϕ.

      As part of an answer to the Reviewer #1 on a the same issue, we mentioned results of preliminary simulations in three dimensions with reduced levels of polydispersity, and discovered that at lower levels of polydispersity (variation in size by a factor of ≈ 2 and polydispersity value 11.50%), the relaxation time does saturate beyond a certain packing fraction (see Fig. 3). We have not established if η, the key quantity of interest, would exhibit a similar behavior in 3D.

      Author response image 3.

      (A) τα as a function of ϕ for 11% polydispersity with size variation by a factor of ∼ 2 in the three dimensional system. (B) Same as (A) except polydispersity value is 24% and a size variation by a factor of ∼ 8.

      B/ Role of fluctuations/self-propulsion in this system, and relationship to recent findings

      “A priori it is unclear why equilibrium concepts should hold in zebrafish morphogenesis, which one would expect is controlled by non-equilibrium processes such as self-propulsion, growth and cell division. ”

      This is raised as a key paradox, but is not very clear to me in the context raised by the authors. In particular, they use self-propulsion as a source of activity and explain the evolution of viscosity but a facilitation process involving re-arrangements/motility. But I don’t think self-propulsion has been argued to play a role in zebrafish blastoderm - Ref 14 argues that this is effectively a zerotemperature phenomenon and that cell motility/rearrangements do not show any correlation with viscosity. So this part of the model assumption was not clear to me in relationship with the proposed experimental system. Active noise has been proposed to play key roles in other systems, including motility-driven and tension fluctuation-driven unjamming (among many others Bi et al, PRX, 2016, Mitchel et al, Nat Comm, 2020, Pinheiro et al, Nat Phys, 2022 as well as Kim & Campas, Nat Physics, 2021) - maybe this is somewhere where the author model could fit? In Kim & Campas, Nat Phys, 2021 in particular, the authors develop simulations of non-confluent tissues with noise, that seems to bear some resemblance to the model developed here, so it would be important to discuss the similarities and distinctions (usually I think polydispersity is not considered indeed). In general, the authors look here at a particle based model, but cells have adhesions with well-defined contact angles, so there is a question of the cross-over between their findings and the large body of recent literature on active foams/vertex models (which are not really discussed there).

      We appreciate the lengthy comment here, and there is a lot to unpack. We also thank the referee for the references, some of which we did not know about earlier.

      The primary objective of our study is to determine the simplest minimal model that would explain the experimentally observed dependence of viscosity in zebrafish blastoderm tissue as ϕ is increased beyond a certain packing fraction during morphogenesis. In Reference 14, the authors analyzed the data using the framework of rigidity percolation theory and presented evidence of a genuine equilibrium phase transition. Consequently, one would that expect zebrafish blastoderm tissue to be in equilibrium, which is surprising from many perspectives. However, since the tissue is a growing system involving numerous cell divisions and cell death, it is not immediately evident whether the assumption of equilibrium is valid. Indeed, the same problem arises when considering the glass transition where rapid cooling drives the system out of equilibrium. Nevertheless, heat capacity and η are often analyzed using the notion of equilibrium. Hence, considering this issue within the context of our research appears to be reasonable.

      To the best of our knowledge, the authors in Ref. 14 did not provide an explanation for the η behavior. The focus was, which was excellent and is the basis on which we initiated this study, was on the use of rigidity percolation theory to explain the results. Indeed, they performed an experiment by mildly reducing myosin II activity, which apparently affects cell motility. The quantitative effect was not reported.

      We did not impose any requirement of cell rearrangements etc in the model. There is essentially one variable, free area available, that explains the η dependence on ϕ. It is possible that one can come up with other zero temperature models that could also explain the data. To the best of our knowledge, it has not been proposed.

      It would be interesting to set our model in the context of other models that the referee points out. This would be an interesting research topic to explore. The only comment we would like to make is that it is unclear how vertex model for confluent tissues could explain the viscosity data.

      C/ Calculation of the effective shear viscosity

      The authors calculate viscosity from a Green-Kubo relation, although it would be good to clarify at which time scale (and maybe even shear amplitude) they expect this to be valid. These kinds of model would be expected to show plastic rearrangements for large deformations for instance, could the authors simulate realistic rheological deformations (e.g. Kim & Campas, 2021 applying external shear on the simulations) to see how much this matches both their expectation and the data?

      Once it is established that there is local equilibrium (as implied by the use of phase transition ideas to analyse the experimental data in Ref. 14), it is natural to use the Green-Kubo relation to calculate transport properties. Hence, for our purposes, it is valid for all time scales and amplitude. The Reviewer also wonders if the model could be used to simulate response to shear in order to probe rheological properties. There is no conceptual issue here and indeed this is an excellent suggestion that we intend to pursue in the future.

      D/ Role of cell adhesion

      The authors consider soft elastic disks of different sizes but unless I missed it, there is no adhesion being considered. This is expected to play a key role in jamming and multicellular mechanics, so I think the authors should either look at what this changes in their simulations, or at least discuss why they are neglecting it. One reason I’m asking is that it’s not totally clear to me that the ”free space” picture, coming from the fact that cells can interpenetrate in their model would hold in a model of deformable cells adhering to each other with constant volume (leading to more equilibration of deformations it would seem?).

      The referee raises another question regarding the lack of adhesion in the simulations. As pointed out before, we were trying to create a minimal model to account for the experimental observations for η upon changing the packing fraction. Thus, we a coarse-grained model where we considered poly-disperse cells with elastic interactions which recapitulates the experimental observations. The referee is correct that adhesion plays a role in jammed systems, and examination of how it would affect is an aspect that would be interesting to consider in the future. We hasten to add that even systems without attractive adhesion-type interaction become jammed. In principle, in many-body systems, the parameter space is large and one needs to carefully determine which parameter is important for the problem at hand. Therefore, in the first pass we did not find the need to consider the role of adhesion.

      Minor comments:

      The writing could be condensed in some places, with some details being moved to SI (for instance, section E on ageing is very short and seem more suited for supplements, or at least not as an independent section, note that the figure numbering also jumps to Fig. 9 there, although it’s Fig. 3 just before and Fig. 9 just after - re-ordering into main and supporting figures would be clearer.

      We thank the Reviewer for this recommendation. The ageing section, although is short, it does provide a line of evidence that equilibrium approaches could be valid. We have modestly expanded the section by moving Appendix D to the main text, a general suggestion made by Referee 1. We have tried to be consistent in the numbering of figures in the revision.

      Reviewer #3

      I am very much in favor of the manuscript in its present form - I only suggest commenting (in the manuscript) on the issue described below.

      Motivated by the fact that the experimental system consists of living, motile cells the authors use an active particle model (eq. 6) with stochastic selfpropulsion as the only source for noise (zero-temperature). It would be useful to elaborate briefly how important this stochastic self-propulsion is for the emergent rheological properties of the system (as summarized above): would these properties also be present in the “passive” version of the same model at “non-vanishing” temperature, and if not, why? Or analogously in a “passive” version which is “shaken”, reminiscent of shaken granular matter? To clarify these issues would relate this study to (or discriminate it from) passive, but complex, liquids or granular matter.

      We appreciate the reviewer’s positive feedback on our work. The reviewer has raised an important question concerning our model in which self-propulsion serves as the source of noise. Without self-propulsion, the system would come to a stationary state after reaching mechanical equilibrium. As mentioned in Eqn. (6) (in the main text), we can define a characteristic time . It is possible that scaling the time t by τ would not alter the results.

      The second question raised by the reviewer is also important. A passive version of the model would be to consider Eq. 6 in our article, and instead of using activity use the standard stochastic force. The resulting force would be at a finite temperature,. The coefficient of noise (a diffusion term) would be related to γi through the Fluctuation dissipation theorem(FDT)). Such a system of equations cannot ne mapped to Eq. 6 in which µ and γi are independently varied. It is unlikely that such a model, incorporating a “non-vanishing” temperature, would not result in the observed dependence of η on ϕ for the following reason. The passive model represents a polydisperse system, which would form a glass with η increasing with volume fraction, following the VFT law, as has been demonstrated in the glass transition literature for harmonic glasses. The other proposal whether the shaken version version would explain the experiments is also interesting. These are worth pursuing in future studies.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you very much for the kind comments about our manuscript. We have improved the text to address all reviewers’ comments and suggestions. Additionally, we corrected and improved the supplementary tables.

      Reviewer #1 (Public Review):

      This paper provides new evidence on the relationship between genetic/chromosome divergence and capacity for asexual reproduction (via unreduced, clonal gametes) in hybrid males or females. Whereas previous studies have focussed just on the hybrid combinations that have yielded asexual lineages in nature, the authors take an experimental approach, analysing meiotic processes in F1 hybrids for combinations of species spanning different levels of divergence, whether or not they form asexual lineages in nature. As such, the findings here are a substantial advance towards understanding how new asexual lineages form.

      The quality of the work is high, the analyses are sound, and the authors sensibly link their observations to the speciation continuum. I should also add that the cytogenetic work here is just beautiful!

      A key finding is that the precondition for asexual reproduction - the formation of unreduced gametes - is not unusual among hybrid females, so that we have to consider other factors to explain the rarity of asexual species - a major unresolved issue in evolutionary biology. This work also highlights a previously overlooked effect of chromosome organisation on speciation.

      Thank you for the nice comments about our work as well as for appreciating our cytogenetics work and figures.

      Reviewer #2 (Public Review):

      The authors investigate the origin of asexual reproduction through hybridization between species. In loaches, diploid, polyploid, and asexual forms have been described in natural populations. The authors experimentally cross multiple species of loaches and conduct an impressively detailed characterization of gametogenesis using molecular cytogenetics to show that although meiosis arrests early in male hybrids, a subset of cells in females undergo endoreplication before meiosis, producing diploid eggs. This only occurred in hybrids of parental species that were of intermediate divergence. This work supports an expanding view of speciation where asexuality could emerge during a narrow evolutionary window where genomic divergence between species is not too high to cause hybrid inviability, but high enough to disrupt normal meiotic processes.

      Thank you.

      I enjoyed reading this study and I appreciate the amount of work it takes to conduct these types of cytogenetic experiments. But, my main concern with this study is I was left wondering if the sample sizes are large enough to get a sense how variable endoreplication is in these loach species. Most of the hybrids between species are the result of crosses between 1-2 families. Within males and females, meiocyte observations are limited to a handful of pachytene and diplotene stages. I think it would be helpful to be more transparent about the sample sizes in the main text.

      Thank you for raising this point. We have improved the Supplementary Tables S2 and S3 to clarify how many individuals we analyzed from each genetic family and added this information to the main text. In total we obtained 12 combinations with 19 F1 hybrid families. For the combination, C. elongatoides x C. taenia hybrids we obtained three families, for C. elongatoides x C. ohridana, C. elongatoides x C. tanaitica, C. elongatoides x C. bilineata and C. ohridana x C. bilineata, we obtained two families For the rest of the combinations of hybrids we obtained single family. From these families, 79 individuals were used for the analysis of the meiocites. Additionally, 24 parental individuals, males and females, were analysed. For the parental species, we analysed 852 cells, for hybrid males we investigated 244 cells, and 665 cells for hybrid females.

      Along these lines, the authors argue against the possibility that endoreplication may be predisposed to occur at a higher rate in some species (line 291). Instead, they suggest that endoreplication is a result of perturbing the cell cycle by combining the genomes of two different species. Their main argument is based on gonocyte counts from parental females in a previous reference. It is essential to include counts from the parents used in this study to make a clear comparison with the F1s.

      Thank you, we agree with your comment and included the observations of meiocytes from several parental species, i.e. C. elongatoides, C. taenia, C. pontica, C. tanaitica, and C. ohridana. Among 852 cells analyzed, we did not observe cells with duplicated genomes and abnormalities in chromosomal pairing. By contrast, among 665 pachytene cells of F1 hybrid females, we revealed altogether ~1% of endoreplicated ones. We tested these data by binomial GLM and found these differences to be significant, suggesting that sexuals, even if they may have some unnoticed duplication events, clearly have a significantly lower incidence of abnormal pachytene cells. We have now included this information in the main text.

      In the discussion (lines 320-333), the authors postulate the sex-specific clonality they observe could be a result of Haldane's rule. Given these fish do not have known sex chromosomes, I do not find this argument strong. Haldane's rule refers to the exposure of recessive incompatibilities with the sex chromosomes in the hybrid heterogametic sex. This effect would therefore be limited to degenerated sex chromosomes where much of the sequence content on the Y or W has been lost. These species may have homomorphic sex chromosomes, but if this is the case, they likely are not very degenerated. Instead, it seems more plausible that the sex-specific effect the authors observe is due to intrinsic differences of spermatogenesis and oogenesis. Is there any information about sex-specific differences in the fidelity of gametogenesis from other species that would support a higher likelihood of endoreplication?

      Thank you for this important question, however, we think it was a misunderstanding. We do not postulate that our observation conforms to Haldanes’ rule as, by contrast to this rule based on sex chromosomes, our previous publication demonstrated that whatever the gonadal sex differentiation is in our taxa, the ability to overcome sterility by asexual gametogenesis is always confined to female gonadal environment (or oogenesis in general), even in the transplanted spermatogonial cells (Tichopad et al. 2022). What we meant by our text is that our results do not fully conform to Haldane’s rule. We therefore reworded our text to rule out such a misconception.

      Nonetheless, we note that it has been demonstrated that Haldanes’ rule is also applicable to species with little differentiated sex chromosomes (e.g. Presgraves and Orr 1998) and that recessive incompatibilities are not the only explanation as faster male theory or faster X may also apply in such cases (Dufresnes et al. 2016). Therefore, we have kept our remarks about Haldane’s rule here. Moreover, for several parental species, we preliminary found the occurrence of an XY gonadal sex differentiation system, albeit these are unpublished and need further validation.

      The final thing I was left wondering about was this missing link between endoreplication and activating the embryonic development of the diploid egg. In these loach species, a sperm is required to activate egg development, but the sperm genome is discarded (line 100). What is the mechanism of this and how does it evolve concurrently during hybridization?

      Thank you for the comment. There have been many speculations about why gynogens actually need sperm to activate their egg development, but to our knowledge, no explanation has been validated to date. Interestingly, a recent theoretical model by Fyon et al. BiorXiv 2023 suggested that the ability of sperm exclusion may evolve separately from the ability to produce clonal eggs. Hence, this topic is complex and remains unresolved, and we feel that it is out of the scope of the present MS. We have slightly modified the text and added 2 refs., to address your suggestion.

      Reviewer #1 (Recommendations For The Authors):

      The paper is well prepared - though the resolution of Fig 1 on the pdf is rather poor.

      Thank you! We have now provided the high-resolution figures.

      Overall, I have few suggestions for improvements:

      Line 58. How does endoduplication itself "overcome accumulated incompatibilities" other than failure of synapsis? Perhaps by maintaining the F1 state, and so avoiding reduced fitness arising from recombination and disruption of coadapted gene combinations.

      We have added a sentence to the main text “Premeiotic genome endoreplication thus not only ensures clonal reproduction but also allows hybrids to overcome problems in chromosome pairing that would otherwise lead to their sterility 15,17.” that we hope sufficiently addresses this issue.

      Line 118 - please explain the AKD index here - as you have some in SI. Also please be clearer on how you measure genetic divergence as proportion of heterozygous SNPs - presumably this is via exon sequences from F1 females?

      Please note that we have explained the AKD index in the relevant part of the Methods section already. However, we have now also added a brief explanation to the Results section, as suggested. We apologize for imprecise description of the genetic divergence measurements. As described in the Methods section, this is not measured by heterozygosity (as we wrongly stated here), but as p-distance among sequences of coding regions between parental species.

      Lines 126 ff. It is unfortunate that the design of the crosses was not more balanced or extensive. Nonetheless, I do appreciate the effort involved here and think the results are solid as is.

      Thank you.

      Line 142. Please define PS and TB (and other acronyms) at first use.

      We have added the definition for all acronyms at the first use.

      Lines 192-193. What about EP and EN - as shown to have unreduced gametes in Fig. 2?

      Thank you for this question. Based on analyses of the diplotene stage, we showed that EP and EN hybrids produced diploid eggs. However, in pachytene, we did not find duplicated oocytes due to the rarity of endoreplication. Similarly, the low incidence of duplicated pachytene cells was observed in natural as well as F1-hybrids in loaches and reptiles (Newton et al., 2016, Dedukh et al., 2021, 2022).

      Lines 217-219. The observed correlation of chromosome divergence (AKD index) and numbers of bivalents in pachytene makes sense and is an important observation. Did this GLM simultaneously consider the effect of genetic divergence (as implied in methods)?

      Thank you for this comment. We originally tested separately the fit of two models, one with AKD and the other with SNP divergence. Since the AKD model significantly outperformed the SNP-based one, we focused our interpretation on the former. However, as you suggested, we now re-calculated the model taking into account the joint effects of both predictors in a single model and indeed, this model outperformed both single predictors. In conclusion, while AKD is still the strongest single predictor for the observed amounts of bivalents, the additional effect of genetic distance still significantly improves the model fit. We have now included this result into the main text.

      This finding does not alter our conclusions, it just suggests that the effect of chromosomal morphology is probably more complex, involving the role of more subtle sequence divergence or structural variants.

      Line 242. The Discussion is a great read - careful interpretation and a really interesting interpretation in context of the broader literature.

      Thank you for the appreciation. Your positive feedback and evaluation are highly motivating us to expand our work.

      Line 396. Some references from book chapters (18, 52) are incomplete. Please fix.

      We have now corrected these references accordingly.

      Reviewer #2 (Recommendations For The Authors):

      Transparency about meiocyte sample sizes: These counts are all in supplemental table 3. From this table, it is unclear if a majority of these meiocytes are from a single individual or from multiple males or females. Or, in the crosses where there are multiple families, are the meiocytes sampled from all families? I am trying to get a sense whether endoreplication and the fidelity of oogenesis could be influenced by genetic variants segregating within species. If the meiotcytes are only sampled from a single individual from a single cross, you may not see this variation. If this is the case, perhaps the correlation between genetic divergence and the formation of asexual clones may not be as strong. Additional replicates may not be feasible, but at a minimum I think it would be helpful to address whether endoreplication could or could not be variable and if the sample sizes are sufficient.

      Thank you for raising this point. We have improved the Supplementary table to clarify how many individuals we analyzed from each family and added this information to the main text. Unfortunately, additional replicates are not feasible due to the long generation time of the fish. We otherwise agree with your comment and included this point in the Discussion.

      Gonocyte counts from parental females: The authors say they "analysed hundreds of gonocytes of sexual females without a single incidence of genome endoreplication." I could not find a clear count in the references given. They note that the incidence of endoreplication was very low in pachytene cells in this study (0.7%).

      Thank you, we agree with your comment and included the observations of meiocytes from several parental species, i.e. C. elongatoides, C. taenia, C. pontica, C. tanaitica, and C. ohridana. Among 852 cells analyzed, we did not observe cells with duplicated genomes and abnormalities in chromosomal pairing. By contrast, among 665 pachytenic cells of F1 hybrid females, we revealed altogether ~1% of endoreplicated ones. We tested these data by binomial GLM and found these differences to be significant, suggesting that sexuals, even if they may have some unnoticed duplication events, clearly have significantly lower incidence. of abnormal pachytene cells. We have now included this information in the main text.

      They refer to supplemental table 4 (line 196), which does not exist in the supplement. The authors should report these numbers in the revised manuscript.

      Thank you for pointing this out. We have corrected the name of the supplementary table, it actually is supplementary table S3.

    1. Author Response

      eLife assessment

      The authors' finding that PARG hydrolase removal of polyADP-ribose (PAR) protein adducts generated in response to the presence of unligated Okazaki fragments is important for S-phase progression is potentially valuable, but the evidence is incomplete, and identification of relevant PARylated PARG substrates in S-phase is needed to understand the role of PARylation and dePARylation in S-phase progression. Their observation that human ovarian cancer cells with low levels of PARG are more sensitive to a PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation, suggests that low PARG protein levels could serve as a criterion to select ovarian cancer patients for treatment with a PARG inhibitor drug.

      Thank you for the assessment and summary. Please see below for details as we have now addressed the deficiencies pointed out by the reviewers.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      Reviewer #1 (Public Review):

      I have a major conceptual problem with this manuscript: How can the full deletion of a gene (PARG) sensitize a cell to further inhibition by its chemical inhibitor (PARGi) since the target protein is fully absent?

      Please see below for details about this point. Briefly, we found that PARG is an essential gene (Fig. 7). There was residual PARG activity in our PARG KO cells, although the loss of full-length PARG was confirmed by Western blotting and DNA sequencing (Fig. S9). The residual PARG activity in these cells can be further inhibited by PARG inhibitor, which eventually lead to cell death.

      The authors state in the discussion section: "The residual PARG dePARylation activity observed in PARG KO cells likely supports cell growth, which can be further inhibited by PARGi". What does this statement mean? Is the authors' conclusion that their PARG KOs are not true KOs but partial hypomorphic knockdowns? Were the authors working with KO clones or CRISPR deletion in populations of cells?

      The reviewer is correct that our PARG KOs are not true KOs. We were working with CRISPR edited KO clones. As shown in this manuscript, we validated our KO clones by Western blotting, DNA sequencing and MMS-induced PARylation. Despite these efforts and our inability to detect full-length PARG in our KO clones, we suspect that our PARG KO cells may still express one or more active fragments of PARG due to alternative splicing and/or alternative ATG usage.

      As shown in Fig. 7, we believe that PARG is essential for proliferation. Our initial KO cell lines are not complete PARG KO cells and residual PARG activity in these cells could support cell proliferation. Unfortunately, due to lack of appropriate reagents we could not draw solid conclusions regarding the isoforms or the truncated PARG expressed in these cells (Please see Western blots below).

      Are there splice variants of PARG that were not knocked down? Are there PARP paralogues that can complement the biochemical activity of PARG in the PARG KOs? The authors do not discuss these critical issues nor engage with this problem.

      There are five reviewed or potential PARG isoforms identified in the Uniprot database. The sgRNAs used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), while isoforms 4 and 5 are considered catalytically inactive according to the Uniprot database. However, it is likely that sgRNA-mediated genome editing may lead to the creation of new alternatively spliced PARG mRNAs or the use of alternative ATG, which can produce catalytically active forms of PARG. Instead of searching for these putative spliced PARG RNAs, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown in Author response image 1. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoform was expressed in our PARG KO cells. Nevertheless, we directly measured PARG activity in PARG KO cells (Fig. S9) and showed that we were still able to detect residual PARG activity in these PARG KO cells. These data clearly indicate that residual PARG activity are present and detected in our KO cells, but the precise nature of these truncated forms of PARG remains elusive.

      Author response image 1.

      These issues have to be dealt with upfront in the manuscript for the reader to make sense of their work.

      We thank this reviewer for his/her constructive comments and suggestions. We will include the data above and additional discussion upfront in our revised manuscript to avoid any further confusion by our readers.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Nie et al investigate the effect of PARG KO and PARG inhibition (PARGi) on pADPR, DNA damage, cell viability, and synthetic lethal interactions in HEK293A and Hela cells. Surprisingly, the authors report that PARG KO cells are sensitive to PARGi and show higher pADPR levels than PARG KO cells, which are abrogated upon deletion or inhibition of PARP1/PARP2. The authors explain the sensitivity of PARG KO to PARGi through incomplete PARG depletion and demonstrate complete loss of PARG activity when incomplete PARG KO cells are transfected with additional gRNAs in the presence of PARPi. Furthermore, the authors show that the sensitivity of PARG KO cells to PARGi is not caused by NAD depletion but by S-phase accumulation of pADPR on chromatin coming from unligated Okazaki fragments, which are recognized and bound by PARP1. Consistently, PARG KO or PARG inhibition shows synthetic lethality with Pol beta, which is required for Okazaki fragment maturation. PARG expression levels in ovarian cancer cell lines correlate negatively with their sensitivity to PARGi.

      Thank you for your nice comments. The complete loss of PARG activity was observed in PARG complete/conditional KO (cKO) cells. These cKO clones were generated using wild-type cells transfected with sgRNAs targeting the catalytic domain of PARG in the presence of PARP inhibitor.

      Strengths:

      The authors show that PARG is essential for removing ADP-ribosylation in S-phase.

      Thanks!

      Weaknesses:

      1) This begs the question as to the relevant substrates of PARG in S-phase, which could be addressed, for example, by analysing PARylated proteins associated with replication forks in PARG-depleted cells (EdU pulldown and Af1521 enrichment followed by mass spectrometry).

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      2) The results showing the generation of a full PARG KO should be moved to the beginning of the Results section, right after the first Results chapter (PARG depletion leads to drastic sensitivity to PARGi), otherwise, the reader is left to wonder how PARG KO cells can be sensitive to PARGi when there should be presumably no PARG present.

      Thank you for your suggestion! However, we would like to keep the complete PARG KO result at the end of the Results section, since this was how this project evolved. Initially, we did not know that PARG is an essential gene. Thus, we speculated that PARGi may target not only PARG but also a second target, which only becomes essential in the absence of PARG. To test this possibility, we performed FACS-based and cell survival-based whole-genome CRISPR screens (Fig. 5). However, this putative second target was not revealed by our CRISPR screening data (Fig. 5). We then tested the possibility that these cells may have residual PARG expression or activity and only cells with very low PARG expression are sensitive to PARGi, which turned out to be the case for ovarian cancer cells. Equipped with PARP inhibitor and sgRNAs targeting the catalytic domain of PARG, we finally generated cells with complete loss of PARG activity to prove that PARG is an essential gene (Fig. 7). This series of experiments underscore the challenge of validating any KO cell lines, i.e. the identification of frame-shift mutations, absence of full-length proteins, and phenotypic changes may still not be sufficient to validate KO clones. This is an important lesson we learned and we would like to share it with the scientific community.

      To avoid further misunderstanding, we will include additional statements/comments at the end of “PARG depletion leads to drastic sensitivity to PARGi” section and at the beginning of “CRISPR screens reveal genes responsible for regulating pADPr signaling and/or cell lethality in WT and PARG KO cells”. Hope that our revised manuscript will make it clear.

      3) Please indicate in the first figure which isoforms were targeted with gRNAs, given that there are 5 PARG isoforms. You should also highlight that the PARG antibody only recognizes the largest isoform, which is clearly absent in your PARG KO, but other isoforms may still be produced, depending on where the cleavage sites were located.

      The sgRNAs used to generate PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), while isoforms 4 and 5 are considered catalytically inactive according to the Uniprot database. As suggested, we will modify Fig. S1D and the figure legends.

      The manufacturer instruction states that the Anti-PARG antibody (66564S) can only recognize isoform 1, this antibody could recognize isoforms 2 and 3 albeit weakly based on Western blot results with lysates prepared from PARG cKO cells reconstituted with different PARG isoforms, as shown below. As suggested, we will add a statement in the revised manuscript and provide the Western blotting data in Author response image 2.

      Author response image 2.

      To test whether other isoforms were expressed in 293A and/or HeLa cells, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown in Author response image 3. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoforms or truncated forms were expressed in our PARG KO cells.

      Author response image 3.

      4) FACS data need to be quantified. Scatter plots can be moved to Supplementary while quantification histograms with statistical analysis should be placed in the main figures.

      We agree with this reviewer that quantification of FACS data may provide straightforward results in some of our data. However, it is challenging to quantify positive S phase pADPr signaling in some panels, for example in Fig. 3A and Fig. 4C. In both panels, pADPr signaling was detected throughout the cell cycle and therefore it is difficult to know the percentage of S phase pADPr signaling in these samples. Thus, we decide to keep the scatter plots to demonstrate the dramatic and S phase-specific pADPr signaling in PARG KO cells treated with PARGi. We hope that these data are clear and convincing even without any quantification.

      5) All colony formation assays should be quantified and sensitivity plots should be shown next to example plates.

      As suggested, we will include the sensitivity plot next to Fig. 3D. However, other colony formation assays in this study were performed with a single concentration of inhibitor and therefore we will not provide sensitivity plots for these experiments. Nevertheless, the results of these experiments are straightforward and easy to interpret.

      6) Please indicate how many times each experiment was performed independently and include statistical analysis.

      As suggested, we will add this information in the revised manuscript.

      Reviewer #3 (Public Review):

      Here the authors carried out a CRISPR/sgRNA screen with a DDR gene-targeted mini-library in HEK293A cells looking for genes whose loss increased sensitivity to treatment with the PARG inhibitor, PDD00017273 (PARGi). Surprisingly they found that PARG itself, which encodes the cellular poly(ADP-ribose) glycohydrolase (dePARylation) enzyme, was a major hit. Targeted PARG KO in 293A and HeLa cells also caused high sensitivity to PARGi. When PARG KO cells were reconstituted with catalytically-dead PARG, MMS treatment caused an increase in PARylation, not observed when cells were reconstituted with WT PARG or when the PARG KO was combined with PARP1/2 DKO, suggesting that loss of PARG leads to a strong PARP1/2-dependent increase in protein PARylation. The decrease in intracellular NADH+, the substrate for PARP-driven PARylation, observed in PARG KO cells was reversed by treatment with NMN or NAM, and this treatment partially rescued the PARG KO cell lethality. However, since NAD+ depletion with the FK868 nicotinamide phosphoribosyltransferase (NAMPT) inhibitor did not induce a similar lethality the authors concluded that NAD+ depletion/reduction was only partially responsible for the PARGi toxicity. Interestingly, PARylation was also observed in untreated PARG KO cells, specifically in S phase, without a significant rise in γH2AX signals. Using cells synchronized at G1/S by double thymidine blockade and release, they showed that entry into S phase was necessary for PARGi to induce PARylation in PARG KO cells. They found an increased association of PARP1 with a chromatin fraction in PARG KO cells independent of PARGi treatment, and suggested that PARP1 trapping on chromatin might account in part for the increased PARGi sensitivity. They also showed that prolonged PARGi treatment of PARG KO cells caused S phase accumulation of pADPr eventually leading to DNA damage, as evidenced by increased anti-γH2AX antibody signals and alkaline comet assays. Based on the use of emetine, they deduced that this response could be caused by unligated Okazaki fragments. Next, they carried out FACS-based CRISPR screens to identify genes that might be involved in cell lethality in WT and PARG KO cells, finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity, whereas loss of PARP1 had the opposite effects. They also found that BER pathway disruption exhibited synthetic lethality with PARGi treatment in both PARG KO cells and WT cells, and that loss of genes involved in Okazaki fragment ligation induced S phase pADPr signaling. In a panel of human ovarian cancer cell lines, PARGi sensitivity was found to correlate with low levels of PARG mRNA, and they showed that the PARGi sensitivity of cells could be reduced by PARPi treatment. Finally, they addressed the conundrum of why PARG KO cells should be sensitive to a specific PARG inhibitor if there is no PARG to inhibit and found that the PARG KO cells had significant residual PARG activity when measured in a lysate activity assay, which could be inhibited by PARGi, although the inhabited PARG activity levels remained higher than those of PARG cKO cells (see below). This led them to generate new, more complete PARG KO cells they called complete/conditional KO (cKO), whose survival required the inclusion of the olaparib PARPi in the growth medium. These PARG cKO cells exhibited extremely low levels of PARG activity in vitro, consistent with a true PARG KO phenotype.

      We thank this reviewer for his/her constructive comments and suggestions.

      The finding that human ovarian cancer cells with low levels of PARG are more sensitive to inhibition with a small molecule PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation (pADPr) that are toxic to cells is quite interesting, and this could be useful in the future as a diagnostic marker for preselection of ovarian cancer patients for treatment with a PARG inhibitor drug. The finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity is in keeping with the conclusion that PARG activity is essential for cell fitness, because it prevents excessive protein PARylation. The observation that increased PARylation can be detected in an unperturbed S phase in PARG KO cells is also of interest. However, the functional importance of protein PARylation at the replication fork in the normal cell cycle was not fully investigated, and none of the key PARylation targets for PARG required for S phase progression were identified. Overall, there are some interesting findings in the paper, but their impact is significantly lessened by the confusing way in which the paper has been organized and written, and this needs to be rectified.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      As suggested, we will revise our manuscript accordingly and provide additional explanation/statement upfront to avoid any misunderstandings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      1) Utilization of known AhR ligands as controls will strengthen the interpretation of the conclusions.

      We agree with the reviewer that AhR ligands could be used as controls for delineating structure-activity relationships and cell context-specific effects. However, such studies are beyond the scope of the current manuscript. The AhR has many endogenous ligands, including several tryptophan derived metabolites, that have been shown to elicit different responses depending on the dose and cell type. Our unpublished data show that the expression of AhR target genes such as Cyp1a1, Cyyp2e1, and Tiparp were not modulated by I3A in RAW cells, which suggests that the observed effects may occur independent of the AhR.

      Reviewer #2:

      Specific comments:

      1) The title is misleading "Microbially-derived indole-3-actate" suggests that this article is about the production of I3A by the gut microbiota, in fact this is a dietary supplementation article. The title needs to reflect this fact.

      Our title reflects the natural source of I3A in mice. We used oral supplementation to study the effects of this metabolite. Per suggestion by the reviewer, we changed the title as follows: <br /> “Oral supplementation of gut microbial metabolite indole-3-acetate alleviates diet-induced steatosis and inflammation in mice”

      2). The description of the amount of I3A in the drinking water is not properly described. The actual concentration in the drinking water should be given.

      The concentration of I3A in drinking water was as follows: WD50 = 0.5mg/ml and WD100 = 1mg/ml. We added this information in the revised manuscript.

      3) The serum concentration data of I3A is critical data and should be moved in Figure 1.

      We have now included serum levels of I3A as part of Figure 1.

      4) The authors should have determined the actual concentration of indole-3-actetate in serum by running a standard curve of I3A during the LC-MS analysis. Also, recovery and matrix effects should be determined. Without this information their data will be difficult to compare to other studies.

      We agree with the reviewer that quantification of I3A in serum would be useful. However, we are unable to do so due to limited sample available as well as concerns with sample integrity after long-term storage.

      5) In the data in Figure S1C, there appears to be only 2-3 mice out of nine that exhibit a difference in serum indole-3-acetate levels between the WD-50 and WD-100. Do the authors have an explanation for this small difference compared to the other endpoints assessed?

      The serum I3A measurements at week 16 are a snapshot that may not reflect tissue levels due to differences in water intake, I3A metabolism in the body, and/or elimination of I3A. The other phenotypic assays are physiological measurements that reflect the result of sustained administration of I3A.

      6) Since the Ah receptor may play a role in the results obtained CYP1A1 mRNA levels in the liver and intestinal tract should have been measured.

      We measured alterations in Cyp1a1 mRNA in the liver and no significant change was observed in the WD50 and WD100 groups relative to controls. Also, see response to reviewer 1.

      7) The main mechanistic experiment performed is shown in Figure 6 and the figure legend states that they are examining macrophages, but these are cell lines, they are macrophages models, and this should be clearly stated. The first two panels are liver data, so the title of the figure legend needs to reflect that fact.

      We agree and have changed the title of Figure 6 to “I3A modulates AMPK phosphorylation and suppresses RAW 264.7 macrophage cell inflammation in an AMPK dependent manner”.

      8) In Figure 6, 1 mM I3A is added to the cells, how is this very high concentration relevant to the concentrations observed in vivo? Does adding 1 mM acetate to the cell culture media lower the pH of the media and could this influence the results obtained? Would acetic acid yield the same results? Could treatment with an acid even explain in vivo results?

      It is difficult to match the concentration of I3A in the in vitro experiments to liver tissue concentrations. Addition of 1 mM I3A did not lower the pH of cell culture media or reduce the viability of cultured RAW 264.7 macrophages. As I3A is not known to degrade into acetic acid and indole, we do not expect acetic acid to recapitulate the effects elicited by I3A.

      Reviewer #3:

      My primary concern with the manuscript is the organization and interpretation of the data. It appears that little effort was given by the authors on interpreting the data and digesting it for the reader into a coherent package. Rather, the authors have collected a vast amount of data and organized it without much thought about what the reader would take away from it. Furthermore, it seems the authors have taken this as an opportunity to overload this manuscript with data that are superfluous to the conclusions the authors draw at the end. Based on this, I think the authors need to invest more time into distilled their complex biological data into a unifying scientific interpretation for the readers that advances our understanding of I3A. My suggestions for the authors are described below.

      1) The data lack a rationale behind how they are organized within the manuscript. For example, the authors will combine disparate biological pathways and lump data together without logic as in Figure 2. Why are inflammatory pathways and bile acid synthesis combined in a figure? What was the rationale?

      We respectfully disagree that the data are presented without rationale. Both inflammation and bile acid dysregulation are commonly observed with NAFLD and thus are presented in two separate panels of Figure 2 (A, inflammatory cytokines, and B bile acids).

      2) The authors give very little effort to performing integrative omics analysis even though multi-omics is provided. Example given, the authors provide proteomic data on the fatty acid metabolism pathway, however, no mention of this pathway within the metabolomic dataset. Vice versa, the authors provide in depth investigation in the metabolic changes within the tryptophan pathway, however, no investigation into the proteomic changes that may underlie this phenomenon. It would be recommended that the authors invest more energy into performing more in-depth analysis of their multi-omics data presented.

      We attempted to co-analyze the proteomic and metabolomic data, but this analysis was not informative. Protein and metabolite abundances do not necessarily correlate, and the two types of omics data carry different observation biases. For example, label-free, untargeted proteomics data favor abundant proteins, whereas untargeted metabolomics data are influenced by concentration and ionization efficiency, among other factors. Therefore, we opted to analyze the two datasets independently, and then linked the findings from the two analyses using biological pathways as guides. For example, we describe changes in acyl-carnitine and discuss how this observation is consistent with changes in abundance of fatty acid metabolism enzymes.

      3) Figures 1&2 shows that low dose treatment reduces inflammation but does not alter hepatic TG levels. This is in direct disagreement with the graphical model provided by the authors (Supp. Fig 9). In the author's model, I3A is directing hepatic lipid metabolism through modulation of macrophage inflammation. This interpretation is erroneous and needs to be reevaluated by the authors. Furthermore, the tryptophan pathway and bile acid pathways are not even represented in the model, which begs the question of why that data are included in the manuscript to begin with.

      We would like to respectfully point out that Figure 1D does show a statistically significant (p < 0.05) difference in liver TG between the WD and WD100 groups. Supp. Figure S9 is meant to be a summary of the main biochemical changes elicited by I3A that we have shown in the current study (e.g., the involvement of AMPK) rather an atlas of all the changes detected in the metabolomics and proteomic data. Specifically, we have not included the tryptophan or bile acid pathways as we do not have mechanistic information on how these changes are mediated by I3A.

      4) The authors switch from hepatocytes to macrophages without giving any rationale, The authors need to invest more time into describing a logical flow of thought when assembling the manuscript.

      We mention the rationale for investigating the effect of I3A on macrophages in the introduction (last paragraph of the section): “In vitro, both I3A and TA attenuated the expression of inflammatory cytokines (Tnfα, Il-1β and Mcp-1) in macrophages exposed to palmitate and LPS.”. We also explain why we used an in vitro model, RAW cells, at the beginning of the corresponding Results section: “Since our previous study found that the metabolic effects of I3A in hepatocytes depend on the AhR, we tested if this was also the case in macrophages.” Moreover, the strong effects of I3A on liver inflammatory cytokines also motivates the macrophage experiments.

    1. Author Response

      We thank the Editors and the Reviewers for the time spent on our manuscript entitled “The CD4 transmembrane GGXXG and juxtamembrane (C/F)CV+C motifs mediate pMHCII-specific signaling independently of CD4-Lck interactions”. We appreciate the helpful feedback and the opportunity to participate in eLife’s new model for publishing.

      We are writing to provide the following provisional author responses for posting with the first version of the reviewed preprint:

      1) To address comments about the limited scope of this study and referencing of the Methods section to our prior study, we would like to note that we submitted the current study via the Research Advance mechanism. Our goal was to build upon the conclusions of our 2022 eLife publication (PMID: 35861317) and address an unresolved question from that study (as nicely summarized by Reviewer #2). In the current manuscript we present data from reductionist experiments that were designed specifically for this purpose and, as noted by the reviewers, we provide answers to the question being asked. We think that the Research Advance mechanism is an ideal opportunity to make these results available to the field given the stated purpose of such articles (for reference: “A Research Advance might use a new technique or a different experimental design to generate results that build upon the conclusions of the original research by, for example, providing new mechanistic insights or extend the pathway under investigation…”).

      a. The Methods were not duplicated in this manuscript because we referenced our prior study as per instructions for the Research Advance mechanism.

      2) The constituent residues of the motifs analyzed in this and our prior study were determined to be functionally significant in vivo through the computational reconstruction of CD4’s evolutionary history, which provided us with data from ~435 million years of natural experiments with CD4 in numerous jawed vertebrate species. We agree that having conditional knock-in mice of these CD4 mutants, and those characterized in our last study, would be useful for determining how these mutations impact T cell development, activation, differentiation, and effector function. Given the costs involved with making genetically engineered mouse model systems, the computational and experimental data we have generated in the current and prior study will help us prioritize next steps to dig deeper into the details of why the residues we are studying are under purifying selection (fail to propagate to progeny if mutated, meaning terminal). In short, only now, with the data in hand, can we prioritize mouse studies. We think it is important for the advancement of the field that we make these results available in a timely manner rather than waiting to report them together with the results of mouse models once generated and analyzed.

      3) The reductionist experimental data presented here provide us with mechanistic insights into why the residues we are studying are functionally important. We therefore think it is of value to note that 58a-b- T cell hybridomas were used in seminal work that established a link between CD4Lck association, via motifs in the CD4 intracellular domain, and signaling output as measured by IL-2 production (Glaichenhaus, et al., 1991). Importantly, the impact of disrupting CD4-Lck interactions on proximal signaling were not interrogated until the work we describe here and in our preceding study, wherein we establish that CD4-Lck association does not regulate proximal signaling in 58a-b- T cell hybridomas. Given that this experimental system was used to help establish the dominant paradigm (i.e. the widely held view that CD4 recruits Lck to TCR-CD3 to initiate pMHCII-specific signaling), we think it is a legitimate system to directly test this model and further test core questions of CD4 function by employing more modern experimental techniques.

    1. Author Response:

      We would like to express our heartfelt gratitude for the reviewers’ scholarly and insightful reviews of our manuscript. The constructive comments and thought-provoking experimental proposals have been invaluable not only in improving the quality of this study but also in shaping the direction of future research. In revision, all comments will be addressed point-by-point, and the manuscript will be revised thoroughly. Here in this reply, we focus on the most critical issue regarding the source of noises during stability inference.

      When faced a stack of objects, individuals are more likely to assess taller stacks of objects as being more unstable compared to shorter ones (Fig. 2b & 2d). This bias persists even when comparing single objects of different heights that share the same contact area with the supporting surface. Known as “stability inference bias,” this phenomenon challenges deterministic models with a single, fixed vector for the representation of gravity’s direction (i.e., directly downward). To reconcile this bias with deterministic models, previous studies (e.g., Allen et al., 2020; Battaglia et al., 2013; Kubricht et al., 2017) have incorporated external noises such as perceptual uncertainty and external force perturbations to increase their fit to human performance, also pointed out by Reviewer 1.

      In this study, we introduced an alternative perspective through a stochastic model in which variability is instead embedded in the representation of gravity’s direction. In this framework, gravity’s direction is not a fixed vector but a distribution of possible vectors, with the vertical direction serving as the maximum likelihood. While the distinction between deterministic and stochastic models is conceptually clear, mathematically they are equivalent. In addition, our stochastic model does not negate the role of external noises in stability inference, because gravity is seldom the sole force acting upon a moving object in the physical world, as pointed out by Reviewer 1. Together, these two factors make it challenging to ascribe the source of variability to either external or internal noises (Smith & Vul, 2013). This is the major concern raised by all three reviewers.

      To distinguish between the deterministic and stochastic models, we designed a series of experiments aimed at demonstrating that internal noises, rather than external noises such as perceptual uncertainty or external force perturbations, influences our inference about object stability. However, the supporting evidence was dispersed and at times implicit throughout the manuscript. In revision, we will thoroughly clarify the ambiguities. In this reply, we will consolidate and present the evidence comprehensively.

      1. The examination of external noises.

      1.1 External Force Perturbations. Deterministic models suggests that during object stability inference, individuals implicitly assume the presence of external forces (e.g., wind) that could destabilize stacks. While this assumption aligns with the omnipresence of such forces in natural settings, it overlooks a crucial variable: the directionality of these external forces. In psychological studies, individual differences are commonly observed, and the perceived force direction is not an exception. That is, some may assume that it comes from the left, while others from the right. In essence, if external forces were to play a significant role in stability inference, one would expect the perceived force directions to exhibit non-uniform distributions (i.e., anisotropy) in the horizontal plane within individuals and to show substantial variability between individuals.

      Contrary to this expectation, our study revealed a different pattern. In the study, we specifically measured the distribution of 𝜑, the horizontal component reflecting the direction of object collapse. Our results indicated that all participants exhibited a uniform distribution of gravity’s directions in the horizontal plane (Fig. 1d right; Extended Data Fig. 2 and 3). This uniformity suggests that if external forces were a key determinant in stability inference, participants would have to assume a varying direction of external force in each trial—an assumption we consider unlikely. Instead, our RL model simulation suggests that the isotropy of 𝜑 arises from agent-environment interactions, notably in the absence of external forces (Extended Data Fig. 6).

      In summary, the uniform distribution of horizontal direction component, 𝜑, observed in all participants, challenges the argument for the dominant role of external forces in stability inference. We are sorry that this aspect was not explicitly emphasized in the original text, and in revision we will explain why external forces are unlikely to substantially shape our perception of object stability.

      1.2 Perceptual uncertainty. To assess the impact of perceptual uncertainty on stability inference, we examined whether the representation of gravity’s direction is cognitive impenetrable. Specifically, we posited that if noises are external (i.e., perceptual uncertainty), the inference bias should be modulated by task context; in contrast, if noises are internal, the stochastic representation of gravity’s direction will be encapsulated from the context. To test this idea, we inverted the virtual environment, making gravity appear to point upward (also see a similar idea by Reviewer 3). In this unfamiliar context, which diverges dramatically from daily experiences, one would expect heightened perceptual uncertainty, which according to deterministic models would result in a larger inference bias – manifested as an increased width of the distribution (𝜎) of gravity’s direction. Contrary to this prediction, we observed that the width of the distribution remained unchanged (Fig. 1d and 1f). Furthermore, there was a high correlation (r = 0.91) between widths in the upright and inverted conditions across participants (Extended Data Fig. 2 and 3).

      In summary, this finding suggests that the manipulation of perceptual uncertainty is unable to cognitively penetrate the representation of gravity’s direction, casting doubt on its dominant role in stability inference. We are sorry that in the original text, we did not clarify the rationale for employing the approach of cognitive impenetrability. In revision, this will be clarified.

      2. The origin of intrinsic noises in stability inference.

      In deterministic models, either external force perturbations or perceptual uncertainty is often assumed but rarely empirically tested. Indeed, these external noises are introduced primarily to account for observed biases in stability inference. In this study, we explicitly examined the possible origin of the intrinsic noises embedded in the representation of gravity’s direction. Without assumed perceptual uncertainty and external perturbation of forces, the RL model simulation showed that the distribution could evolve naturally based mainly on the agent’s experience, as it used the mismatch between the expectation and the observed state of the stack under natural gravity to update its representation of gravity’s direction (Fig. 3a). Importantly, the width of the distribution for the agent was comparable to that of human participants as measured in the psychophysics experiments (Fig. 3b). Therefore, the experience alone may be sufficient to generate stochastic representation of gravity’s direction, obviating the need for external noises.

      Taken together, these findings underscore the limitations of the combination of deterministic models and external noises in accounting for stability inference, and suggest that intrinsic noises embedded in the representation of gravity play a pivotal role in shaping our stability inference of the physical world.

      3. Thought experiments.

      Although the evidence shown above may provide valuable insights, our study does not definitively settle the debate between deterministic models and our proposed stochastic model. Specifically, our study only preliminarily investigates two sources of external noise, perceptual uncertainty and external force perturbations, leaving many other factors such as object mass and surface friction, unexplored (for studies on these factors, please see Hamrick et al., 2016). As such, the reviewers have proposed a series of thought experiments that warrant further investigation. Below, we enumerate some of them, followed by ours.

      3.1 Experiment 1. Reviewer 3 proposed a thought experiment in which participants assess stability of a single block of varying heights. The reviewer argues that a block, regardless of its height, will remain stable on a horizontal surface unless externally disturbed. This assertion is perfectly true in the physical realm. However, in the cognitive domain, both deterministic models and our stochastic model predict differently. Take an extreme example of a standing needle: while it would remain upright in the physical world without external disturbances, both deterministic and stochastic models, which account for mental inference of physical events, will predict a likelihood of it falling, aligning with our subjective feelings. This is because in both models, noises are considered in the intuitive physics engine. In deterministic models, external force perturbations, as well as perceptual uncertainty, are assumed to be omnipresent noises in probabilistic reasoning. In our stochastic model, noises are embedded in the representation of gravity’s direction. Therefore, although this thought experiment, along with other thought experiments on object mass, surface friction (proposed by Reviewer 3), and falling trajectories behind an occlude (proposed by Reviewer 1), is insightful, but it cannot serve to differentiate deterministic and stochastic models. 3.2 Experiment 2. Reviewer 2 suggested constructing a wall on one side of the virtual scene to make it improbable that participants would infer an external force perturbation emanating from that direction. In this setting, deterministic models would predict a non-uniform distribution of the horizontal component, 𝜑, skewed away from the wall. In contrast, according to our stochastic model, the distribution of 𝜑 would remain unaffected, maintaining the uniform distribution observed in previous experiments. Extending this logic, another test scenario could contrast an indoor scene with an outdoor scene. In a confined and static indoor environment, the likelihood of external force perturbations should be much lower than in a dynamic, open outdoor setting. Here, deterministic models would predict an increase in the width of the distribution, 𝜎, in the outdoor environment, whereas our model would anticipate no such change. The underlying rationale for these experiments parallels that of our previous setup (figure 1e), where we inverted the virtual environment and reversed the direction of gravity. Indeed, they all aim to assess the extent to which manipulations of external factors can cognitively penetrate the representation of gravity’s direction.

      3.3 Experiment 3: A noteworthy insight derived from our RL model simulation relates to variations in the number of blocks within the virtual worlds. Deterministic models would predict an enlarged bias in stability inference as the number of blocks increased, which is attributed to elevated levels of perceptual uncertainty and an expanded area susceptible to external force perturbations. However, the results from our RL model simulation contradict this prediction, revealing that an augmented number of blocks instead led to a narrowing of the width of the distribution. This decrease in width can be ascribed to richer information provided by a larger number of blocks for refining its representation of gravity’s direction. In line with this rationale, we propose a new experiment from the perspective of ecological psychology, which emphasizes that cognitive processes are shaped by our interactions with the environment. Specifically, we hypothesize that individuals raised in mountainous terrains may exhibit more accurate representations of gravity’s direction than those raised in flat terrains. This proposed experiment could not only help resolving the ongoing debate between two models to some extent, but also advocate future studies on intuitive physics within a more ecologically valid framework.

      To conclude, both deterministic and stochastic models align closely with Bayesian principles, where stability inference is conceptualized as probabilistic reasoning. Nevertheless, the divergence between them is no trivial, as it hinges on distinct philosophical assumptions about the relationship between the inner mind and the external world. Deterministic models propose that the mind serves as a faithful reflection of the world; therefore, gravity’s direction is represented as a single, fixed vector directly downward, the same as that in the world. In these models, uncertainty for probabilistic reasoning emanates from factors external to the module of the intuitive physics engine. In contrast, our stochastic model underscores the notion that the mind is an active inference machine, continually reinterpreting inputs from outside world; therefore, the mind gains increased adaptability, allowing for a more nuanced accounting of uncertainty in the world – factors often crucial for survival. Such active inference necessitates flexible representations; accordingly, within the model of intuitive physics engine, variations are embedded into the representation of gravity’s direction. While resolving this philosophical debate is beyond the capacity of the present study, we contend that the field of intuitive physics offers a valuable lens through which to pry open the complex interplay between the mind and the world we live in.

      References

      • Allen, K. R., Smith, K. A., & Tenenbaum, J. B. (2020). Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proceedings of the National Academy of Sciences, 117(47), 29302–29310.
      • Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110(45), 18327–18332.
      • Kubricht, J. R., Holyoak, K. J., & Lu, H. (2017). Intuitive physics: Current research and controversies. Trends in Cognitive Sciences, 21(10), 749–759.
      • Smith, K. A., & Vul, E. (2013). Sources of uncertainty in intuitive physics. Topics in Cognitive Science, 5(1), 185–199.
    1. Author Response:

      Reviewer #1 (Public Review):

      Summary: The authors made significant updates to Hippacampome.org including 50 new cell types.

      Strengths: The authors have been thorough in basing their views on peer-reviewed literature. They have made the data highly accessible and the user has the ability to control what is included.

      Weaknesses: There are many inconsistencies in the literature regarding cell types and how these are incorporated into hippocampome.org is not clear.

      We agree with the Reviewer that there can be inconsistencies in the literature, especially when it comes to nomenclature. This is why for Hippocampome.org v1.0 we decided to focus on the morphologies, the distributions of axons and dendrites across the layers and parcels of the hippocampal formation, rather than the names authors have applied to the neurons they are studying. We have also clarified our stance on nomenclature in our Brain Informatics manuscript that accompanied v1.1. We will revise the manuscript to make these points explicit.

      Properties are often a result of modeling and not biological data, and caveats to this approach, and other assumptions are unclear.

      The foundation for Hippocampome.org has always been the data that are published in the literature. Those include, among others, the axonal and dendritic spans in each layer and subregion, the molecular expression patterns, the total neuron count by layer and subregion, the membrane properties, firing patterns, and experimental synaptic signals and corresponding covariates. For all of those, we do not depend on how the data are modeled, although there is always some level of interpretation of the data to make them machine readable and ready for incorporation into our database. However, some of the simulation-ready parameters now also included in Hippocampome.org are indeed the result of modeling, such as the neuronal input/output functions (Izhikevich model) and the unitary synaptic values (Tsodyks-Markram model). Other simulation-ready parameters are the result of specific analysis approaches, including the connection probabilities (axonal-dendritic spatial overlaps) and the neuron type census (numerical optimization of all constraints). We plan to explicitly distinguish among these various cases in the revised manuscript.

      Several interneuron subtypes in the dentate gyrus do not appear to be listed, such as neurogliaform cells.

      The neuron types listed in Figure 2 of the current manuscript are only the new additions to the catalog of neuron types at Hippocampome.org v2.0. DG Neurogliaform cells were included in our original eLife manuscript, which described the deployment of v1.0 of the website. We will clarify this in the revisions.

      The nomenclature HIPROM should be distinguished or made synonymous with HIPP. Same for MOCAP and MOPP/HICAP.

      The Reviewer has referred to 5 separate neuron types in Hippocampome.org. Each neuron type has a unique distribution of axonal and dendritic invasions of the 26 layers and parcels of the hippocampal formation. For example, HIPROM cells have dendrites in the inner one-third of stratum moleculare, stratum granulosum, and hilus and axons in all four layers of the dentate gyrus in addition to axonal projections into CA3 stratum radiatum, stratum lucidum, stratum pyramidale, and stratum oriens. HIPP cells in contrast have dendrites only in the hilus and axons only in the outer two-thirds of stratum moleculare with no cross-subregional projections. Similar considerations distinguish MOPP, MOCAP, and HICAP cells in Hippocampome.org. In expanding the nomenclature to include the neuron types we first described at Hippocampome.org, we attempted to mimic the styling of the already established neuron types of the DG: HIPROM (Hilar Interneuron with PRojections to the Outer Molecular layer), HIPP (HIlar Perforant Path-associated), MOCAP (MOlecular Commissural-Associational Pathway-related axons and dendrites), MOPP (MOlecular layer Perforant Path-associated), and HICAP (HIlar Commissural-Associational Pathway-related). We intend to insert a paragraph in the revised version to clarify these issues.

      Dorsal ventral and sex differences are not mentioned.

      We thank the Reviewer for pointing this out. As a result of the dearth of literature describing differences between dorsal and ventral hippocampus when we first assembled Hippocampome.org v1.0, we made the decision to focus solely on the distributions of the axons and dendrites along the depth, or layers, of the hippocampal formation. As the amount of literature concerning relating to the other axes of the hippocampus continues to grow, we will gradually incorporate information along the added dimensions into our knowledge base. In the revised manuscript we intend to note this, and also stress the fact that Hippocampome.org contains knowledge from a mixture of sexes, and that whenever the original papers report the animal sex, so does our knowledge base. The revised manuscript will also mention that, whenever possible (e.g. synaptic physiology parameters), values are reported separately for males and females.

      Reviewer #2 (Public Review):

      Summary and strengths: The authors have developed a helpful resource for the community regarding hippocampal cell types and their interactions from many perspectives. There have been many updates to hippocampome v1.0 to v1.12, that are nicely summarized and explained (e.g., Table 1). The content and impact are also presented (Fig. 4).

      Weaknesses: My main comment is that it is not completely clear and/or it is a bit buried as to what makes this v2.0 (rather than v1.13). The title would seem to encompass it ('... enabling data-driven spiking neural network simulations...), but in the introduction, the authors seem to emphasize "50 newly identified neuron types...". Is it the case that launching network simulations (using CARLsim) was not possible up to v1.12? I don't think so? I think that this research advance is to announce and summarize the various updates and to demonstrate how network simulations can be easily done? If so, this should and could be made more clear so that the reader does not necessarily have to go through all the previous versions to understand what is 'special' or different about v2.0. This could perhaps be achieved by situating their tool and its goals relative to other efforts (e.g., blue brain project) that are mentioned in the Discussion?

      We thank the Reviewer for their helpful suggestions. Hippocampome.org v1.12 included the final piece needed, the synaptic physiology parameter values, to start fully simulating the hippocampal formation. In the revised manuscript, we will endeavor to emphasize more the specialness of v2.0 over the various v1.X in the Abstract, Introduction, and Discussion, in part by more fully describing the differences between our work and that of other efforts, such as the Blue Brain Project.

      Reviewer #3 (Public Review):

      Summary: The authors aim to provide a multidisciplinary resource on the structural and physiological organization of the hippocampal system and make the available experimental data available for further theoretical work, providing tools to do so in a very flexible and user-friendly way. Since this is a new version of an already existing data-resource, the authors certainly reach their aim and fulfil expectations that the reader might have. The content of the database is as good as the original data, collected from the published knowledge-database, sometimes with the help of the original authors, and the overall quality depends further on how the data are curated by the team of authors and many others who helped them. That process is briefly described and more details are available in descriptions of previous versions and on the website. The data extraction, examples of how data can be used, and the part on attempts to model the hippocampus are exciting and open doors to new and exciting research opportunities.

      Strengths: Excellent description with many outlined opportunities. Nicely illustrated and inviting to explore the online database.

      Weaknesses: The figures are complex, containing a heavy information load with many abbreviations. You need some general knowledge of the system in order to grasp the enormous potential of what is provided.

      We agree with the Reviewer that we generously used abbreviations throughout our figures as a means of conserving limited space. We have attempted to balance that by providing a complete glossary of all the abbreviations used throughout the manuscript. However, we will make an effort to supply definitions of the abbreviations in the figure captions and at their first use in the manuscript, or even replacing the abbreviations altogether in key places in the figures.

    1. Author Response

      We are very thankful for the editors' and reviewers' thoughtful feedback and criticisms on our manuscript. We have carefully considered all of the comments and will provide a revised manuscript with detailed responses as soon as we can. In the meantime, we will make our best effort to conduct additional experiments to further support our conclusions.We greatly appreciate the time and consideration given to improving our work.

      Reviewer #1 (Public Review):

      Summary:

      The question at hand is whether astrocytes contribute to the mechanism of long-term synaptic potentiation (LTP) at synaptic contacts between excitatory glutamatergic neurons and inhibitory neurons (E-I synapses). This is a legitimate query considering the immense body of work that has now established synaptic plasticity (LTP, LTD and spike-timing dependent plasticity) as an astrocyte-dependent process at excitatory synapses and, by contrast, the lack of knowledge on whether and how astrocytes control IN activity. Taking direct inspiration from that same body of work, authors recapitulate a number of experiments and approaches from prior seminal studies and provide evidence that E-I synapses in the stratum radiatum of the hippocampus display NMDAR-dependent plasticity, which can be suppressed by pharmacologically hindering astrocytes physiology, preventing astrocyte Ca2+ transients or blocking endocannabinoid CB1 receptors. Under any of these conditions, LTP can still be rescued by exogenously applying D-serine, a naturally occurring co-agonist of NMDARs primarily released by astrocytes. Coincidently, authors show that the conditions used to elicit LTP also cause a transient increase in NMDAR co-agonist site occupancy. Lastly, based on some evidence that gamma-CaMKII is predominantly expressed in INs rather than excitatory neurons, authors conducted AAV-mediated IN-specific gamma-CaMKII shRNA experiments and found that this is sufficient to suppress LTP at E-I synapses. They found that this approach also impairs contextual fear learning in behaving mice. Authors conclude that astrocytes gate LTP at E-I synapses via a mechanism wherein neuronal depolarization during LTP induction elicits endocannabinoid release which drives CB1-dependent astrocyte Ca2+ activity, causing the release of the NMDAR co-agonist D-serine (required for NMDAR activation).

      Strengths:

      This is an important question and the experimental work seems to have been conducted at high standards. The electrophysiology traces are impeccable, the experiments are well powered, including the behavioral testing, and multiple controls and validations are provided throughout. The figures are clear and easy to understand. Overall, the conclusions from the study are consistent, or partially consistent, by the findings.

      We greatly appreciate you taking the time to evaluate our study thoroughly and provide such thoughtful feedback.

      Main Weaknesses:

      1) A major point of concern is the lack of proper acknowledgment of the seminal studies that were mimicked in this manuscript, notably Henneberger et al, Nature 2010, Adamsky et al, Cell 2018; and Robin et al., Neuron 2017. The entire study design is a replica of these landmark studies: it isn't built upon or inspired from them, it exactly repeats the experiments and methods performed in them, coming dangerously close to being simply a hidden attempt to plagiarize published work. The resemblance goes as far as using an identical figure display (see Fig4.D vs Fig 2D of Ref#4). The issue is that authors frame the problem, scientist logic, reasoning, technical tricks, approaches, and interpretations as their own whereas, in reality, they were taken verbatim out of previous work and applied to a (shockingly) similar problem. The probity of the present study is thus in question. Authors need to clearly acknowledge, in all relevant instances, that the work presented here recapitulates the approach, reasoning and methodology used in past seminal studies that tackled the mechanisms of astrocyte regulation of LTP.

      Thank you very much for your review and valuable comments on our manuscript. We greatly appreciate your concern regarding the proper acknowledgment of previous studies. We sincerely apologize for not adequately citing and acknowledging the seminal works in our manuscript. We highly value avoiding academic misconduct.

      For the research design, although there are some similarities between our work and other studies, our key scientific questions and technical approaches are markedly different, as evidenced by our central hypothesis and experimental methods. We did not completely replicate their research design.

      Regarding research methods, many basic techniques like electrophysiology, chemogenetic are common experimental methods, not patented by any one paper. Our choice of methods is based on the research needs, not to replicate a particular paper. But we recognize that there are similarities in our experimental methods, specifically the chemogenetic stimulation of astrocytes to induce de novo LTP, which has been inspired by previous studies (Van Den Herrewegen et al. Molecular Brain (2021), Adamsky et al. Cell (2018), Nam et al. Cell reports (2019)). We were also inspired by the previous work of Henneberger et al. in Nature (2010) to investigate whether stimulation, specifically we using TBS (theta burst stimulation), could transiently increase NMDA receptor-mediated synaptic responses.

      For the similarity between our Fig. 4D and Fig. 2D of Ref. 4, it is primarily because both studies have the similar purpose(we monitored NMDA currents in interneurons, others monitored in pyramidal cells) using similar methods, but our figure layout follows a regular display pattern. Additionally, we would like to draw your attention to our previous studies, specifically Shen et al., Scientific Reports (2017), Supplementary figure 4, and Shen et al., Journal of Neurochemistry (2021), Supplementary figures 8 and 9. In these studies, we also employed a regular display pattern in our figure layouts. It is important to note that while there may be similarities in the figure arrangement, each study presents distinct findings and contributes to the broader understanding of the topic.Our use of a similar way to present data does not equal plagiarism. We apologize for any confusion caused by the lack of explicit citation and acknowledgment in our manuscript again. In the revised version, we will ensure to provide clear and detailed references to all relevant studies.

      In terms of citations, we have cited Henneberger et al, Nature 2010, Adamsky et al, Cell 2018; and Robin et al., Neuron 2017.'s work in multiple places, indicating we have learned from their research ideas and findings. We will supplement any missing citations. But overall, our work has distinct differences and innovations.

      We are not intended as a hidden attempt to plagiarize or simply replicate their methods. Rather, they are part of a deliberate effort to establish a comparable and reproducible experimental framework. Our study aims to validate and further explore the conclusions drawn by replicating the experiments of these seminal studies and deepening our understanding of the mechanisms of astrocyte regulation of LTPE-I.

      We sincerely appreciate your review and guidance. We will carefully consider your criticism and incorporate more accurate and thorough citations in the revised version, ensuring proper respect and acknowledgment of the previous works.

      2) Relatedly, in past work, field recordings were used to monitor LTP in hippocampal slices (refs 4, 26 and others). This method captures indiscriminately all excitatory synapses where glutamate is released to cause AMPAR-dependent (and NMDAR) transmembrane flux of cations in the postsynaptic element, including E-I synapses and not just E-E synapse like the authors claim. Therefore, a strong argument can be made that there is no actual ground to differentiate the present results from past ones.

      Thank you for your thoughtful comments regarding the differentiation of our results from previous studies. We appreciate the opportunity to address this issue and provide further clarification.

      Indeed, in past studies, field recordings were commonly utilized to monitor long-term potentiation (LTP) in hippocampal slices. It is true that this method captures all flux of cations in excitatory synapses, inhibitory synapses and glia. This includes both excitatory-excitatory (E-E) and excitatory-inhibitory (E-I) synapses.

      When using the LTP recording protocol, one limitation is that the experimenter cannot determine the exact contribution of E-E and E-I currents to the recorded current. Additionally, it is not possible to know, with the same induction protocol, the specific effects on E-E synapses versus E-I synapses. It is plausible that E-E synapses could undergo LTP, while E-I synapses could undergo LTD, or vice versa.

      Thus, it becomes crucial to carefully dissect the functioning of E-I synapses and investigate how astrocytes modulate these synapses. Past field recordings have provided important insights, our selective interrogation of the astrocyte-E-I synapse interface represents a conceptual advance to delineate the nuanced modulation of distinct synaptic connections by astrocytes. We specifically focus on studying the modulation of E-I synapses by astrocytes and aim to elucidate the intricate dynamics and underlying mechanisms. By untangling the complex contributions of astrocytes to E-I synapse function and plasticity, we can unveil novel aspects of neuroglial interactions and advance our understanding of the fundamental principles governing neural network activity.

      3) There is a general lack of excitement about this study. One reason is that it replicates almost identically past work, as mentioned above. Another is that the scientific question and importance of the findings are not framed appropriately. The work is presented as an astrocyte-focused investigation, but it has very limited value to the astrocyte field. The findings are, in all accounts, identical to those unveiled by previous work especially because E-I synapses are, in fact, excitatory synapses. Where this study does bring value, however, is to the field of interneurons, but it would need to be reframed to shift the emphasis from astrocytes to E-I connections. Authors would need to elevate the text by framing their work around relevant considerations, such as IN diversity, mechanisms of LTP in IN subtypes, role of E-I connections in hippocampal circuit function, information processing, behavior, spatial learning, navigation, or grid cells activity etc...

      We appreciate your insightful comments and concerns regarding the lack of excitement surrounding our study. We would like to clarify that while our study use similar certain methodologies, for example electrophysiology, chemogenetics and pharmacology, our research aims to provide a deeper understanding of the underlying mechanisms of how astrocytes regulate E-I synapses. We apologize if this replication aspect was not adequately highlighted in our manuscript, and we will make sure to emphasize the novel contributions of our study in the revised version.

      Regarding the framing of our study, we recognize the importance of interneurons and the role of E-I connections in hippocampal circuit function, information processing, behavior, spatial learning, navigation, and other relevant aspects. However, the scientific question and scope of the study are to explore whether and how astrocytes modulate E-I synapses. We believe that this study brings value to the field of astrocyte-neuron interaction. Of course, this study also brings value to the field of interneurons. Perhaps the lack of excitement among audiences stems from the mechanisms for astrocytes modulating E-I and E-E synapses are the same.

      4) A clear weakness of the study is that it fails to consider the molecular and functional diversity of interneurons in the stratum radiatum and provides no insights or considerations related to it. Authors provide no information on what type of IN were patched, or the location of their cell body in the s.r., effectively treating all patched IN as a homogeneous ensemble of cells - which they are not. Relatedly, the study is extremely evasive on the importance of the results in the context of inhibitory interneurons. This renders the significance of the insights highly uncertain and dampens both the impact of the study and the excitement it generates. Hippocampal interneurons are very diverse in molecular identity, sub-anatomical location, morphology, projections, connectivity and functional importance. Some experts go as far as recognizing 29 subtypes in the CA1, including 9 in the stratum radiatum alone (based on the location of their soma). However, this is neither addressed nor acknowledged by the authors, with the exception of a statement (line 659) where they claim to have "focused on a subpopulation of interneurons in the stratum radiatum" without providing any precision or evidence to corroborate this assertion. This diversity, alone, could explain why not all cells showed LTP, or why the mechanisms authors describe in the radiatum do not seem to be at play in the oriens. Hence, carefully considering the diversity of INs in the present work is necessary. It would refine and augment the conclusions of the paper. Instead of a sub-region specificity, the study might fuel the notion of an IN subtype specificity of LTP mechanisms, which is more useful to the field.

      Thank you very much for your review and valuable comments on our study. We agree with the point you raised regarding a clear weakness in our study, specifically the lack of consideration the diversity of interneurons in the stratum radiatum.

      As the reviewer notes, there are many subtypes of interneurons in hippocampal region CA1 that likely contribute in distinct ways to circuit function. Unfortunately we did not gather information on the specific molecular or morphological identity of the interneurons we recorded from.This is a limitation of our study. We will add discussion of this issue as a caveat, and highlighted it as an opportunity for future work to dissect how long-term potentiation in interneurons regulated by astrocytes may differ across interneuron subpopulations. Thank you once again for your insightful comments.

      5) Authors take several shortcuts. Some of the conclusions are a leap from the experiments and are only acceptable due to the close analogy with very similar investigations conducted in the past that provided identical results. For instance, the present study provides no evidence of any sort that D-serine is involved - rather, it provides evidence that the pathway at hand contributes to increasing the occupancy of the co-agonist binding site of NMDARs. Considering the absence of work demonstrating that D-serine is the endogenous co-agonist of NMDARs at E-I synapses, most of the authors claims on D-serine are unfounded. This would necessitate using tools such as the canonical D-serine scavengers DAAS or DsDA, serine racemase KO mice etc. Similarly, authors provide no compelling evidence that endocannabinoid CB1 receptors involved in this pathway are located on astrocytes

      Thank you for your insightful comments on our study. We appreciate your attention to detail and your concerns regarding our conclusions. We agree that further evidence is needed to establish the involvement of D-serine as the endogenous co-agonist of NMDARs at E-I synapses. We will take into consideration your suggestion of using tools such as D-serine scavengers to provide clearer evidence.

      Regarding the involvement of endocannabinoid CB1 receptors on astrocytes in this pathway, we provide evidence that astrocytic calcium signaling could blocked by CB1 receptor antagonist AM251, as shown in figure 3.However, we agree that further research is necessary to accurately identify the localization of CB1 receptors. As part of our future investigations, we will take note of this limitation in our discussion and emphasize the need for additional studies to explore the precise location of CB1 receptors. In addition, we will endeavor to perform immunohistochemistry to identify the exact location of CB1 receptors in astrocytes.

      Thank you once again for your valuable feedback. We will carefully address these concerns and make appropriate revisions to ensure the clarity and accuracy of our findings.

      6) An important caveat in this study is the protocol employed to induce LTP, which includes steps of sustained depolarization of the patched IN to -10mV. Neuronal depolarization is known to induce endocannabinoids production. In several instances, this was shown to 'activate' astrocytes and elicit the release of astrocyte-derived transmitters at nearby synapses. This implies that the endocannabinoid-dependent pathway described in the study is, most likely, artificially engaged by the protocol itself. Hence, the present work only provides evidence that an astrocyte-dependent, CB1-D-serine-pathway can be artificially called upon with this specific LTP protocol, but does not convincingly demonstrate that it is naturally occurring or necessary for plasticity at E-I synapses. Authors would need to thoroughly address this caveat by replicating some of their key findings (AM251, calcium-clamp, D-serine and CaMKII shRNA) using a protocol that does not entail the artificial depolarization of the patched interneuron.

      Thank you for raising this important point. We agree that the sustained depolarization protocol we used to induce LTP could potentially engage endocannabinoid signaling and astrocyte activation. However, we observed that preventing astrocyte Ca2+ transients or blocking endocannabinoid CB1 receptors prevented the induction of LTP by this depolarization protocol suggests that this astrocyte-endocannabinoid-dependent pathway is necessary,

      Importantly, synaptic depolarization of neurons can occur naturally during learning and memory. Though ‘artificial’ here, our protocol may mimic aspects of natural activity patterns that engage ‘endocannabinoid release’ and astrocyte involvement in plasticity.

      Another limitation of our study is that we currently cannot conclusively determine the source of the CB1. We cannot distinguish whether the CB1 originates from neurons or astrocytes based on our current experiments. We will explicitly acknowledge this caveat in the discussion, noting that further experiments are needed to clarify the cellular origin of the CB1. Thank you for drawing our attention to this critical issue - we will refine the manuscript accordingly to more comprehensively and accurately present the study conclusions and limitations. Your feedback helps improve the rigor of our research.

      7) Reading and understanding are hindered by a rather vast array of issues with the text itself. It needs thorough editing for typos, misnomers, meaning-altering errors in syntax, and a number of issues with English.

      Thank you very much for your review and feedback on our text. We highly appreciate your comments and take them seriously. We will carefully address the issues you mentioned and thoroughly edit the text to eliminate any typos, misnomers, syntax errors that may alter the meaning, and other English-related issues. We truly value your input and appreciate your patience as we work on these improvements.

      Reviewer #2 (Public Review):

      Summary:

      This work explores the implication of astrocytes in the regulation of long-term potentiation of excitatory synapses onto inhibitory neurons in CA1 hippocampus. They found that astrocytes of a sub-region of CA1 regulate this plasticity through their activation of endocannabinoids that lead to the release of the NMDA receptor co-agonist, D-serine.

      Strengths:

      The experiments are well considered and conceptualized, and use appropriate tools to explore the role of astrocytes in the tripartite synapse. The results highlight a novel role of astrocytes in an important aspect of the synaptic regulation of the hippocampal circuit. There are extensive levels of analysis for each experimental group of evidence.

      Thank you for your positive feedback on our study. We appreciate your recognition of the careful consideration and conceptualization of our experiments, as well as the use of appropriate tools to investigate the role of astrocytes in the tripartite synapse. We are pleased to hear that the results have highlighted a novel role of astrocytes in an important aspect of synaptic regulation in the hippocampal circuit.

      Thank you for taking the time to review our work and for providing such positive feedback. We will continue to improve and refine our study based on your valuable comments.

      Weaknesses:

      The authors underscore and used an oversimplified view of the heterogeneity of interneuron populations and their selective roles in the hippocampal network. Also, there is an uneven level of astrocyte-selective tools used in the different experiments which creates an uneven strength of arguments and conclusions regarding the role of glial cells. Finally, the wording used by the authors often lead to some confusion or sense of overinterpretation

      We appreciate the reviewer raising these important points about the characterization of interneuron and astrocyte populations in our study. We agree that oversimplifying or overlooking cellular heterogeneity could undermine the conclusions. In the revised manuscript, we will:

      1) Add more detailed discussion of interneuron diversity. We will note this as an area for further study.

      2) Review the wording used when describing results and conclusions, ensuring we avoid overstating interpretations of the data.

      Thank you again for the thoughtful feedback.

    1. Author Response

      We thank the reviewers for truly valuable advice and comments. We have made multiple corrections and revisions to the original pre-print accordingly. Here we address 2 major points.

      1) Regarding the genetic association of the common COL11A1 variant rs3753841 (p.(Pro1335Leu)), we do not propose that it is the sole risk variant contributing to the association signal we detected and have clarified this in the manuscript. We concluded that it was worthy of functional testing for reasons described here. Although there were several common variants in the discovery GWAS within and around COL11A1, none were significantly associated with AIS and none were in linkage disequilibrium (R2>0.6) with the top SNP rs3753841. We next reviewed rare (MAF<=0.01) coding variants within the COL11A1 LD region of the associated SNP (rs3753841) in 625 available exomes representing 46% of the 1,358 cases from the discovery cohort. The LD block was defined using Haploview based on the 1KG_CEU population. Within the ~41 KB LD region (chr1:103365089- 103406616, GRCh37) we found three rare missense mutations in 6 unrelated individuals, Author response table 1. Two of them (NM_080629.2: c.G4093A:p.A1365T; NM_080629.2:c.G3394A:p.G1132S), from two individuals, are predicted to be deleterious based on CADD and GERP scores and are plausible AIS risk candidates. At this rate we could expect to find only 4-5 individuals with linked rare coding variants in the total cohort of 1,358 which collectively are unlikely to explain the overall association signal we detected. Of course, there also could be deep intronic variants contributing to the association that we would not detect by our methods. However, given this scenario, the relatively high predicted deleteriousness of rs3753841 (CADD= 25.7; GERP=5.75), and its occurrence in a Gly-X-Y triplet repeat, we hypothesized that this variant itself could be a risk allele worthy of further investigation.

      Author response table 1.

      We also appreciate the reviewer’s suggestion to perform a rare variant burden analysis of COL11A1. We conducted pilot gene-based analysis in 4534 European ancestry exomes including 797 of our own AIS cases and 3737 controls and tested the burden of rare variants in COL11A1. SKATO P value was not significant (COL11A1_P=0.18) but this could due to lack of power and/or background from rare benign variants that could be screened out using the functional testing we have developed.

      2) Regarding functional testing, by knockdown/knockout cell culture experiments, we showed for the first time that Col11a1 negatively regulates Mmp3 expression in cartilage chondrocytes, an AIS-relevant tissue. We then tested the effect of overexpressing the human wt or variant COL11A1 by lentiviral transduction in SV40-transformed chondrocyte cultures. We deleted endogenous mouse Col11a1 by Cre recombination to remove the background of its strong suppressive effects on Mmp3 expression. We acknowledge that Col11a1 missense mutations could confer gain of function or dominant negative effects that would not be revealed in this assay. However as indicated in our original manuscript we have noted that spinal deformity is described in the cho/cho mouse, a Col11a1 loss of function mutant. We also note the recent publication by Rebello et al. showing that missense mutations in Col11a2 associated with congenital scoliosis fail to rescue a vertebral malformation phenotype in a zebrafish col11a2 KO line. Although the connection between AIS and vertebral malformations is not altogether clear, we surmise that loss of the components of collagen type XI disrupt spinal development. in vivo experiments in vertebrate model systems are needed to fully establish the consequences and genetic mechanisms by which COL11A1 variants contribute to an AIS phenotype.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We thank both reviewers for their detailed and positive assessment of our work.

      To Reviewer #2, we have now explicated the pattern -- (QXQXQX>3)4 where X>3 denotes any length of three or more residues of any composition -- in the first paragraph of the discussion.

      To Reviewer #3, we have made slight modifications to the text in the “Q zippers poison themselves” results section, to attempt to further clarify the mechanism of self-poisoning.

      Briefly, the reviewer questions if an alternative model -- where inhibition involves non-structured rather than Q-zipper containing oligomers -- better explains the data. We provided two lines of evidence that we believe exclude this alternative model. First, we point out in the first paragraph of the “Q zippers poison themselves” section that the cells that unexpectedly lack amyloid in the high concentration regime have negligible levels of AmFRET, indicating that the inhibitory oligomers themselves occur at low concentrations regardless of the total concentration, and are therefore limited by a kinetic barrier. Second, we point out in the third paragraph of the section that the severity of amyloid inhibition with respect to concentration has a sequence dependence that matches the expectation of converging phase boundaries for crystal polymorphs -- specifically, inhibition is most severe for sequences that have a local Q density just high enough to form a Q zipper on both sides of each strand. Inhibition relaxed for sequences having more or less Qs than that threshold. In contrast, disordered oligomerization is not expected to have such a dependence on the precise pattern of Qs and Ns.


      The following is the authors’ response to the original reviews.

      We are pleased that the editors find our study valuable. We find that the reviewers’ criticisms largely arise from misunderstandings inherent to the conceptually challenging nature of the topic, rather than fundamental flaws, as we will elaborate here. We are grateful for the opportunity afforded by eLife to engage reviewers in what we intend to be a constructive public dialogue.

      Response to Reviewer 1

      This review is highly critical but lacks specifics. The reviewer’s criticisms reflect a position that seems to dismiss a critical role for (or perhaps even the existence of) conformational ordering in polyQ amyloid, which is untenable.

      The reviewer states that our objective to characterize the amyloid nucleus “rests on the assertion that polyQ forms amyloid structures to the exclusion of all other forms of solids”. We do not fully agree with this assertion because our findings show that detectable aggregation is rate-limited by conformational ordering, as evident by 1) its discontinuous relationship to concentration, 2) its acceleration by a conformational template, and 3) its strict dependence on very specific sequence features that are consistent with amyloid structure but not disordered aggregation).

      We strongly disagree with the reviewer’s subjective statement that we have not critically assessed our findings and that they do not stand up to scrutiny. This statement seems to rest on the perceived contradiction of our findings with that of Crick et al. 2013. Contrary to the reviewer’s assessment, we argue here that the conclusions of Crick et al. do more to support than to refute our findings. Briefly, Crick et al. investigated the aggregation of synthetic Q30 and Q40 peptides in vitro, wherein fibrils assembled from high concentrations of peptide were demonstrated to have saturating concentrations in the low micromolar range. As explained below, this finding of a saturating concentration does not refute our results. More relevant to the present work are their findings that “oligomers” accumulated over an hours-long timespan in solutions that are subsaturated with respect to fibrils, and these oligomers themselves have (nanomolar) critical concentrations. The authors postulated that the oligomers result from liquid–liquid demixing of intrinsically disordered polyglutamine. However, phase separation by a peptide is expected to fix its concentration in both the solute and condensed phases, and, because disordered phase separation is faster than amyloid formation, the postulated explanation removes the driving force for any amyloid phase with a critical solubility greater than that of the oligomers. In place of this interpretation that truly does appear to -- in the reviewer’s words -- “contradict basic physical principles of how homopolymers self-assemble”, we interpret these oligomers as evidence of Q zipper-containing self-poisoned multimers, rounded as an inherent consequence of self-poisoning (Ungar et al., 2005), and plausibly akin to semicrystalline spherulites that have been observed in other polymer crystal and amyloid-forming systems (Crist and Schultz, 2016; Vetri and Foderà, 2015). Importantly, the physical parameters governing the transition between amyloid spherulites and fibrils have been characterized in the case of insulin (Smith et al. 2012), where it was found that spherulites form at lower protein concentrations than fibrils. This mirrors the observation by Crick et al. that fibrils have a higher solubility limit than the spherical oligomers. . Further rebuttal to the perceived incompatibility of monomeric nucleation with the existence of a critical concentration for amyloid

      We appreciate that the concept of a monomeric nucleus can superficially appear inconsistent with the fact that crystalline solids such as polyQ amyloid have a saturating concentration, but this is only true if one neglects that polyQ amyloids are polymer crystals with intramolecular ordering. The perceived discrepancy is perhaps most easily dispelled by the fact that folded proteins can form crystals, and the folded state of the protein. These crystals have critical concentrations, and the protein subunits within them each have intramolecular crystalline order (in the form of secondary structure). When placed in a subsaturated solution, the protein crystals dissolve into the constituent monomers, and yet those monomers still retain intramolecular order. Our present findings for polyQ are conceptually no different.

      To further extrapolate this simple example to polyQ, one can also draw on the now well-established phenomenon of secondary nucleation, whereby transient interactions of soluble species with ordered species leads to their own ordering (Törnquist et al., 2018). Transience is important here because it implies that intramolecular ordering can in principle propagate even in solutions that are subsaturated with respect to bulk crystallization. This is possible in the present case because the pairing of sufficiently short beta strands (equivalent to “stems” in the polymer crystal literature) will be more stable intramolecularly than intermolecularly, due to the reduced entropic penalty of the former. Our elucidation that Q zipper ordering can occur with shorter strands intramolecularly than intermolecularly (Fig. S4C-D) demonstrates this fact. It is also evident from published descriptions of single molecule “crystals” formed in sufficiently dilute solutions of sufficiently long polymers (Hong et al., 2015; Keller, 1957; Lauritzen and Hoffman, 1960).

      In suggesting that a saturating concentration for amyloid rules out monomeric nucleation, the reviewer assumes that the Q zipper-containing monomer must be stable relative to the disordered ensemble. This is not inherent to our claim. The monomeric nucleating structure need not be more stable than the disordered state, and monomers may very well be disordered at equilibrium at low concentrations. To be clear, our claim requires that the Q zipper-containing monomer is both on pathway to amyloid and less stable than all subsequent species that are on pathway to amyloid. The former requirement is supported by our extensive mutational analysis. The latter requirement is supported by our atomistic simulations showing the Q zipper-containing monomer is stabilized by dimerization (included in our 2021 preprint). Hence, requisite ordering in the nucleating monomer is stabilized by intermolecular interactions. We provide in Author response image 1 an illustration to clarify what we believe to be the discrepancy between our claim and the reviewer’s interpretation.

      Author response image 1.

      That the rate-limiting fluctuation for a crystalline phase can occur in a monomer can also be understood as a consequence of Ostwald’s rule of stages, which describes the general tendency of supersaturated solutes, including amyloid forming proteins (Chakraborty et al., 2023), to populate metastable phases en route to more stable phases (De Yoreo, 2022; Schmelzer and Abyzov, 2017). Our findings with polyQ are consistent with a general mechanism for Ostwald’s rule wherein the relative stabilities of competing polymorphs differ with the number of subunits (De Yoreo, 2022; Navrotsky, 2004). As illustrated in Fig. 6 of Navrotsky, a polymorph that is relatively stable at small particle sizes tends to give way to a polymorph that -- while initially unstable -- becomes more stable as the particles grow. The former is analogous to our early stage Q zipper composed of two short sheets with an intramolecular interface, while the latter is analogous to the later stage Q zipper composed of longer sheets with an intermolecular interface. Subunit addition stabilizes the latter more than the former, hence the initial Q zipper that is stabilized more by intra- than intermolecular interactions will mature with growth to one that is stabilized more by intermolecular interactions.

      We have added a new figure (Fig. 6) to the manuscript to illustrate qualitative features of the amyloid pathway we have deduced for polyQ.

      Rebuttal to the perceived necessity of in vitro experiments

      The overarching concern of this reviewer and reviewing editor is whether in-cell assays can inform on sequence-intrinsic properties. We understand this concern. We believe however that the relative merit of in-cell assays is largely a matter of perspective. The truly sequence-intrinsic behavior of polyQ, i.e. in a vacuum, is less informative than the “sequence-intrinsic” behaviors of interest that emerge in the presence of extraneous molecules from the appropriate biological context. In vitro experiments typically include a tiny number of these -- water, ions, and sometimes a crowding agent meant to approximate everything else. Obviously missing are the myriad quinary interactions with other proteins that collectively round out the physiological solvent. The question is what experimental context best approximates that of a living human neuron under which the pathological sequence-dependent properties of polyQ manifest. We submit that a living yeast cell comes closer to that ideal than does buffer in a test tube.

      The reviewer’s statements that our findings must be validated in vitro ignores the fact -- stressed in our introduction -- that decades of in vitro work have not yet generated definitive evidence for or against any specific nucleus model. In addition to the above, one major problem concerns the large sizes of in vitro systems that obscure the effects of primary nucleation. For example, a typical in vitro experimental volume of e.g. 1.5 ml is over one billion-fold larger than the femtoliter volume of a cell. This means that any nucleation-limited kinetics of relevant amyloid formation are lost, and any alternative amyloid polymorphs that have a kinetic growth advantage -- even if they nucleate at only a fraction the rate of relevant amyloid -- will tend to dominate the system (Buell, 2017). Novel approaches are clearly needed to address these problems. We present such an approach, stretch it to the limit (as the reviewer notes) across multiple complementary experiments, and arrive at a novel finding that is fully and uniquely consistent with all of our own data as well as the collective prior literature.

      That the preceding considerations are collectively essential to understand relevant amyloid behavior is evident from recent cryoEM studies showing that in vitro-generated amyloid structures generally differ from those in patients (Arseni et al., 2022; Bansal et al., 2021; Radamaker et al., 2021; Schmidt et al., 2019; Schweighauser et al., 2020; Yang et al., 2022). This is highly relevant to the present discourse because each amyloid structure is thought to emanate from a different nucleating structure. This means that in vitro experiments have broadly missed the mark in terms of the relevant thermodynamic parameters that govern disease onset and progression. Note that the rules laid out via our studies are not only consistent with structural features of polyQ amyloid in cells, but also (as described in the discussion) explain why the endogenous structure of a physiologically relevant Q zipper amyloid differs from that of polyQ.

      A recent collaboration between the Morimoto and Knowles groups (Sinnige et al.) investigated the kinetics of aggregation by Q40-YFP expressed in C. elegans body wall muscle cells, using quantitative approaches that have been well established for in vitro amyloid-forming systems of the type favored by the reviewer. They calculate a reaction order of just 1.6, slightly higher than what would be expected for a monomeric nucleus but nevertheless fully consistent with our own conclusions when one accounts for the following two aspects of their approach. First, the polyQ tract in their construct is flanked by short poly-Histidine tracts on both sides. These charges very likely disfavor monomeric nucleation because all possible configurations of a four-stranded bundle position the beginning and end of the Q tract in close proximity, and Q40 is only just long enough to achieve monomeric nucleation in the absence of such destabilization. Second, the protein is fused to YFP, a weak homodimer (Landgraf et al., 2012; Snapp et al., 2003). With these two considerations, our model -- which was generated from polyQ tracts lacking flanking charges or an oligomeric fusion -- predicts that amyloid nucleation by their construct will occur more frequently as a dimer than a monomer. Indeed, their observed reaction order of 1.6 supports a predominantly dimeric nucleus. Like us and others, Sinnige et al. did not observe phase separation prior to amyloid formation. This is important because it not only argues against nucleation occurring in a condensate, it also suggests that the reaction order they calculated has not been limited by the concentration-buffering effect of phase separation.

      While we agree that our conclusions rest heavily on DAmFRET data (for good reason), we do provide supporting evidence from molecular dynamics simulations, SDD-AGE, and microscopy.

      To summarize, given the extreme limitations of in vitro experiments in this field, the breadth of our current study, and supporting findings from another lab using rigorous quantitative approaches, we feel that our claims are justified without in vitro data.

      Rebuttals to other critiques

      We do not deny that flanking domains can modulate the kinetics and stability of polyQ amyloid. However, as stated and referenced in the introduction, they do not appear to change the core structure. We have also added a paragraph concerning flanking domains to the discussion, and acknowledged that “the extent to which our findings will translate in these different contexts remains to be determined.” Nevertheless, that the intrinsic behavior of the polyQ tract itself is central to pathology is evident from the fact that the nine pathologic polyQ proteins have similar length thresholds despite different functions, flanking domains, interaction partners, and expression levels.

      The reviewer states that we found nucleation potential to require 60 Qs in a row. Our data are collectively consistent with nucleation occurring at and above approximately 36 Qs, a point repeated in the paper. The reviewer may be referring to our statement, ”Sixty residues proved to be the optimum length to observe both the pre- and post-nucleated states of polyQ in single experiments”. The purpose of this statement is simply to describe the practical consideration that led us to use 60 Qs for the bulk of our assays. We do appreciate that the fraction of AmFRET-positive cells is very low for lengths just above the threshold, especially Q40. They are nevertheless highly significant (p = 0.004 in [PIN+] cells, one-tailed T-test), and we have modified the figure and text to clarify this.

      The reviewer characterizes self-poisoning as the hallmark of crystallization from polymer melts, which would be problematic for our conclusions if self-poisoning were limited to this non-physiological context. In fact the term was first used to describe crystallization from solution (Organ et al., 1989), wherein the phenomenon is more pronounced (Ungar et al., 2005).

      Response to Reviewer 2

      We thank the reviewer for their detailed and helpful critique.

      The reviewer correctly notes that the majority of our manipulations were conducted with 60-residue long tracts (which corresponds to disease onset in early adulthood), and this length facilitates intramolecular nucleation. However, we also analyzed a length series of polyQ spanning the pathological threshold, as well as a synthetic sequence designed explicitly to test the model nucleus structure with a tract shorter than the pathological threshold, and both experiments corroborate our findings.

      The reviewer mentions “several caveats” that come with our result, but their subsequent elaboration suggests they are to be interpreted more as considerations than caveats. We agree that increasing sequence complexity will tend to increase homogeneity, but this is exactly the motivation of our approach. We explicitly set out to determine the minimal complexity sequence sufficient to specify the nucleating conformation, which we ultimately identified in terms of secondary and tertiary structure. We do not specify which parts of a long polyQ tract correspond to which parts of the structure, because, as the reviewer points out, they can occur at many places. Hence, depending on the length of the polyQ tract, the nucleus we describe may have any length of sequence connecting the strand elements. We do not think that the effects of N-residue placement can be interpreted as a confounding influence on hairpin position because the striking even-odd pattern we observe implicates the sides of beta strands rather than the lengths. Moreover, we observe this pattern regardless of the residue used (Gly, Ser, Ala, and His in addition to Asn).

      We thank the reviewer for noting the novelty and plausibility of the self-poisoning connection. We would like to elaborate on our finding that self-poisoning inhibits nucleation (in addition to elongation), as this will be confusing to many readers. While self-poisoning is claimed to inhibit primary nucleation in the polymer crystal literature (Ungar et al., 2005; Zhang et al., 2018), the semantics of “nucleation” in this context warrants clarification. Technically, the same structure can be considered a nucleus in one context but not in another. The Q zipper monomer, even if it is rate-limiting for amyloid formation at low concentrations (and is therefore the “nucleus”), is not necessarily rate-limiting when self-poisoned at high concentrations. Whether it comprises the nucleus in this case depends on the rates of Q zipper formation relative to subunit addition to the poisoned state. If the latter happens slower than Q zipper formation de novo, it can be said that self-poisoning inhibits nucleation, regardless of whether the Q zipper formed. We suspect this to be the mechanism by which preemptive oligomerization blocks nucleation in the case of polyQ, though other mechanisms may be possible.

      We believe the revised text also now incorporates the remaining suggestions of this reviewer, with two exceptions. 1) We retain the phrase “hidden pattern”, because we believe our data argue for a nucleus whose formation requires that Qs occur in a pattern that we now elaborate as (QXQXQX>3)4 where X>3 denotes any length of three or more residues of any composition. In amyloids formed from long polyQ molecules, the nucleus will involve any subset of 12 Qs that match this pattern. 2) We decided not to re-order the mansucript to discuss self-poisoning after establishing the monomer nucleus (even though we agree that doing so would improve the logical flow) because the interpretation of the data with respect to self-poisoning helps to establish critical strand lengths, and self-poisoning creates an anomaly in the DAmFRET data that is difficult to ignore. We add text clarifying that high local concentrations “effectively shifts the rate-limiting step to the growth of a higher order relatively-disordered species”.

      Response to Reviewer 3

      We thank the reviewer for their helpful comments.

      We opted to retain Figures 1A and B because we think they are important for comprehending the subject and objectives of the study. We modified the former to attempt to make it more clear. We have also elaborated on DAmFRET as it is a relatively new approach that may be unfamiliar to many readers. Beyond this, we refer the reviewer and readers to our cited prior work describing the theory and interpretation of DAmFRET. Note that the y-axes of DAmFRET plots are not raw FRET but rather “AmFRET”, a ratio of FRET to total expression level. As explained thoroughly in our cited prior work, the discontinuity of AmFRET with expression level indicates that the high AmFRET-population formed via a disorder-to-order transition. When the query protein is predicted to be intrinsically disordered, the discontinuous transition to high AmFRET invariably (among hundreds of proteins tested in prior published and unpublished work) signifies amyloid formation as corroborated by SDD-AGE and tinctorial assays.

      When performed using standard flow cytometry as in the present study, every AmFRET measurement corresponds to a cell-wide average, and hence does not directly inform on the distribution of the protein between different stoichiometric species. As there is only one fluorophore per protein molecule, monomeric nuclei have no signal. DAmFRET can distinguish cells expressing monomers from stable dimers from higher order oligomers (see e.g. Venkatesan et al. 2019), and we are therefore quite confident that AmFRET values of zero correspond to cells in which a vast majority of the respective protein is not in homo-oligomeric species (i.e. is monomeric or in hetero-complexes with endogenous proteins). The exact value of AmFRET, even for species with the same stoichiometry, will depend both on the effect of their respective geometries on the proximity of mEos3.1 fluorophores, and on the fraction of protein molecules in the species. Hence, we only attempt to interpret the plateau values of AmFRET (where the fraction of protein in an assembled state approaches unity) as directly informing on structure, as we did in Fig. S3A.

      We believe that AmFRET decreases with longer polyQ because the mass fraction of fluorophore decreases in the aggregate, simply because the extra polypeptide takes up volume in the aggregate.

      Yes, the fraction of positive cells in a discontinuous DAmFRET plot does increase with time. However, given the more laborious data collection and derivation of nucleation kinetics in a system with ongoing translation, especially across hundreds of experiments with other variables, ours is a snapshot measurement to approximately derive the relative contributions of intra- and intermolecular fluctuations to the nucleation barrier, rather than the barrier’s magnitude.

      We have revised the tautological statement by removing “non-amyloid containing”.

      Concerning the correlation of our data with the pathological length threshold -- as we state in the first results section, “Our data recapitulated the pathologic threshold -- Q lengths 35 and shorter lacked AmFRET, indicating a failure to aggregate or even appreciably oligomerize, while Q lengths 40 and longer did acquire AmFRET in a length and concentration-dependent manner”. Hence, most of our experiments were conducted with 60Q not because it resembles the pathological threshold, but rather because it was most convenient for DAmFRET experiments.

      Self-poisoning is a widely observed and heavily studied phenomenon in polymer crystal physics, though it seems not yet to have entered the lexicon of amyloid biologists. We were new to this concept before it emerged as an extremely parsimonious explanation for our results. As described in the text, two pieces of evidence exclude the alternative mechanism suggested by the reviewer -- that non-structured oligomers form and subsequently engage and inhibit the template. Specifically, 1) inhibition occurs without any detectable FRET, even at high total protein concentration, indicating the species do not form in a concentration-dependent manner that would be expected of disordered oligomers; and 2) inhibition itself has strict sequence requirements that match those of Q zippers. Hence our data collectively suggest that inhibition is a consequence of the deposition of partially ordered molecules onto the templating surface.

      We have softened the subheading and text of the relevant section in the discussion to more clearly indicate the speculative nature of our statements concerning the possible role of self-poisoned oligomers in toxicity.

      We stand by our statement 'that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation', as this follows directly from self-poisoning.

      Regarding the arguments for lateral and axial growth, we agree that the data are indirect. However, that polyQ forms lamellar amyloids both in vitro and in vivo is now established, so we do not feel it necessary to rigorously show that here. Nevertheless, we need to include this section primarily because it introduces the fact that ordering in polyQ amyloid occurs in the lateral as well as axial dimensions, and the onset of lateral ordering (lamellar growth) explains the very different behaviors of QU and QB sequences apparent on the DAmFRET plots. Ultimately, the two dimensions of growth are important to understand self-poisoning and maturation of the short nucleating zipper to amyloid.

      References

      Arseni D, Hasegawa M, Murzin AG, Kametani F, Arai M, Yoshida M, Ryskeldi-Falcon B. 2022. Structure of pathological TDP-43 filaments from ALS with FTLD. Nature 601:139–143. doi:10.1038/s41586-021-04199-3

      Bansal A, Schmidt M, Rennegarbe M, Haupt C, Liberta F, Stecher S, Puscalau-Girtu I, Biedermann A, Fändrich M. 2021. AA amyloid fibrils from diseased tissue are structurally different from in vitro formed SAA fibrils. Nat Commun 12:1013. doi:10.1038/s41467-021-21129-z

      Buell AK. 2017. The Nucleation of Protein Aggregates - From Crystals to Amyloid Fibrils. Int Rev Cell Mol Biol 329:187–226. doi:10.1016/bs.ircmb.2016.08.014

      Chakraborty D, Straub JE, Thirumalai D. 2023. Energy landscapes of Aβ monomers are sculpted in accordance with Ostwald’s rule of stages. Sci Adv 9:eadd6921. doi:10.1126/sciadv.add6921 Crist B, Schultz JM. 2016. Polymer spherulites: A critical review. Prog Polym Sci 56:1–63. doi:10.1016/j.progpolymsci.2015.11.006

      De Yoreo JJ. 2022. Casting a bright light on Ostwald’s rule of stages. Proc Natl Acad Sci USA 119. doi:10.1073/pnas.2121661119

      Hong Y, Yuan S, Li Z, Ke Y, Nozaki K, Miyoshi T. 2015. Three-Dimensional Conformation of Folded Polymers in Single Crystals. Phys Rev Lett 115:168301. doi:10.1103/PhysRevLett.115.168301 Keller A. 1957. A note on single crystals in polymers: Evidence for a folded chain configuration. Philosophical Magazine 2:1171–1175. doi:10.1080/14786435708242746

      Landgraf D, Okumus B, Chien P, Baker TA, Paulsson J. 2012. Segregation of molecules at cell division reveals native protein localization. Nat Methods 9:480–482. doi:10.1038/nmeth.1955

      Lauritzen JI, Hoffman JD. 1960. Theory of Formation of Polymer Crystals with Folded Chains in Dilute Solution. J Res Natl Bur Stand A Phys Chem 64A:73–102. doi:10.6028/jres.064A.007

      Navrotsky A. 2004. Energetic clues to pathways to biomineralization: precursors, clusters, and nanoparticles. Proc Natl Acad Sci USA 101:12096–12101. doi:10.1073/pnas.0404778101

      Ohhashi Y, Ito K, Toyama BH, Weissman JS, Tanaka M. 2010. Differences in prion strain conformations result from non-native interactions in a nucleus. Nat Chem Biol 6:225–230. doi:10.1038/nchembio.306

      Organ SJ, Ungar G, Keller A. 1989. Rate minimum in solution crystallization of long paraffins. Macromolecules 22:1995–2000. doi:10.1021/ma00194a078

      Radamaker L, Baur J, Huhn S, Haupt C, Hegenbart U, Schönland S, Bansal A, Schmidt M, Fändrich M. 2021. Cryo-EM reveals structural breaks in a patient-derived amyloid fibril from systemic AL amyloidosis. Nat Commun 12:875. doi:10.1038/s41467-021-21126-2

      Sahoo B, Singer D, Kodali R, Zuchner T, Wetzel R. 2014. Aggregation behavior of chemically synthesized, full-length huntingtin exon1. Biochemistry 53:3897–3907. doi:10.1021/bi500300c

      Schmelzer JWP, Abyzov AS. 2017. How do crystals nucleate and grow: ostwald’s rule of stages and beyond In: Šesták J, Hubík P, Mareš JJ, editors. Thermal Physics and Thermal Analysis, Hot Topics in Thermal Analysis and Calorimetry. Cham: Springer International Publishing. pp. 195–211. doi:10.1007/978-3-319-45899-1_9

      Schmidt M, Wiese S, Adak V, Engler J, Agarwal S, Fritz G, Westermark P, Zacharias M, Fändrich M. 2019. Cryo-EM structure of a transthyretin-derived amyloid fibril from a patient with hereditary ATTR amyloidosis. Nat Commun 10:5008. doi:10.1038/s41467-019-13038-z

      Schweighauser M, Shi Y, Tarutani A, Kametani F, Murzin AG, Ghetti B, Matsubara T, Tomita T, Ando T, Hasegawa K, Murayama S, Yoshida M, Hasegawa M, Scheres SHW, Goedert M. 2020. Structures of α-synuclein filaments from multiple system atrophy. Nature 585:464–469. doi:10.1038/s41586-020-2317-6

      Snapp EL, Hegde RS, Francolini M, Lombardo F, Colombo S, Pedrazzini E, Borgese N, Lippincott-Schwartz J. 2003. Formation of stacked ER cisternae by low affinity protein interactions. J Cell Biol 163:257–269. doi:10.1083/jcb.200306020

      Törnquist M, Michaels TCT, Sanagavarapu K, Yang X, Meisl G, Cohen SIA, Knowles TPJ, Linse S. 2018. Secondary nucleation in amyloid formation. Chem Commun 54:8667–8684. doi:10.1039/c8cc02204f

      Ungar G, Putra EGR, de Silva DSM, Shcherbina MA, Waddon AJ. 2005. The Effect of Self-Poisoning on Crystal Morphology and Growth Rates In: Allegra G, editor. Interphases and Mesophases in Polymer Crystallization I, Advances in Polymer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 45–87. doi:10.1007/b107232

      Vetri V, Foderà V. 2015. The route to protein aggregate superstructures: Particulates and amyloid-like spherulites. FEBS Lett 589:2448–2463. doi:10.1016/j.febslet.2015.07.006

      Wild EJ, Boggio R, Langbehn D, Robertson N, Haider S, Miller JRC, Zetterberg H, Leavitt BR, Kuhn R, Tabrizi SJ, Macdonald D, Weiss A. 2015. Quantification of mutant huntingtin protein in cerebrospinal fluid from Huntington’s disease patients. The Journal of Clinical Investigation.

      Yang Y, Arseni D, Zhang W, Huang M, Lövestam S, Schweighauser M, Kotecha A, Murzin AG, Peak-Chew SY, Macdonald J, Lavenir I, Garringer HJ, Gelpi E, Newell KL, Kovacs GG, Vidal R, Ghetti B, Ryskeldi-Falcon B, Scheres SHW, Goedert M. 2022. Cryo-EM structures of amyloid-β 42 filaments from human brains. Science 375:167–172. doi:10.1126/science.abm7285

      Zhang X, Zhang W, Wagener KB, Boz E, Alamo RG. 2018. Effect of Self-Poisoning on Crystallization Kinetics of Dimorphic Precision Polyethylenes with Bromine. Macromolecules 51:1386–1397. doi:10.1021/acs.macromol.7b02745

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The current manuscript provides a timely contribution to the ongoing discussion about the mechanism of the apical sodium/bile acid transporter (ASBT) transporters. Recent structures of the mammalian ASBT transporters exhibited a substrate binding mode with few interactions with the core domain (classically associated with substrate binding), prompting an unusual proposal for the transport mechanism. Early structures of ASBT homologues from bacteria also exhibit unusual substrate binding in which the core substrate binding domain is less engaged than expected. Due to the ongoing questions of how substrate binding and mechanism are linked in these transporters, the authors set out to deepen our understanding of a model ABST homolog from bacteria N. meningitidis (ABST-NM).

      The premise of the current paper is that the bacterial ASBT homologs are probably not physiological bile acid transporters, and that structural elucidation of a natively transported substrate might provide better mechanistic information. In the current manuscript, the authors revisit the first BASS homologue to be structurally characterized, ABST-NM. Based on bacteriological assays in the literature, the authors identify the coenzyme A precursor pantoate as a more likely substrate for ABSTNM than taurocholate, the substrate in the original structure. A structure of ASBT-NM with pantoate exhibits interesting differences in structure. The structures are complemented with MD simulations, and the authors propose that the structures are consistent with a classical elevator transport mechanism.

      The structural experiments are generally solid, although showing omit maps would bolster the identification of the substrate binding site.

      We have added an omit map in Fig S2.

      One shortcoming is that, although pantoate binding is observed, the authors do not show transport of this substrate, undercutting the argument that the pantoate structure represents binding of a "better" or more native substrate. Mechanistic proposals, like the proposed role of T112 in unlocking the transporter, would be much better supported by transport data.

      In the absence of being able to source radiolabelled pantoate at a reasonable cost, we decided to focus on binding studies, relying on the fact that pantoate/pyruvate uptake has been shown in other BASS transporters. While we agree that transport needs to be substantiated, our crystallographic and molecular dynamics studies combined provide a picture of sodium ions stabilising the substrate binding site to enable the binding of the substrate, which in turn induces further conformational changes. Such changes would be consistent with a mechanism of sodium driven transport with clear coupling of the sodium ions to substrate translocation. We are not saying this is a “better” substrate but rather that a substrate binding like this would be able to elicit the conformational changes necessary for transport – something that has been missing from previous studies.

      Reviewer #2 (Public Review):

      The manuscript starts with a demonstration of pantoate binding to ASBTnm using a thermostability assay and ITC, and follows with structure determinations of ASBTnm with or without pantoate. The structure of ASBTnm in the presence of pantoate pinpoints the binding site of pantoate to the "crossover" region formed by partially unwinded helices TMs 4 and 9. Binding of pantoate induces modest movements of side chain and backbone atoms at the crossover region that are consistent with providing coordination of the substrate. The structures also show movement of TM1 that opens the substrate binding site to the cytosol and mobility of loops between the TMs. MD simulations of the ASBT structure embedded in lipid bilayer suggests a stabilizing effect of the two sodium ions that are known to co-transport with the substrate. Binding study on pantoate analogs further demonstrates the specificity of pantoate as a substrate.

      The weakness of the manuscript includes a lack of transport assay for pantoate and a lack of demonstration that the observed conformational changes in TM1 and the loops are relevant to the binding or transport of pantoate.

      We agree that the manuscript would have been bolstered by transport data (see response to reviewer 1). The take-home message from the movement of TM1 and the loops is that they are flexible. It is probably unlikely that TM1 moves like this during the transport cycle and we have avoided overplaying the significance of this movement. Instead, we have focussed on the conformational changes in the pantoate binding site. We have made an additional movie concentrating on the binding site and not including TM1.

      Overall, the structural, functional and computational studies are solid and rigorous, and the conclusions are well justified. In addition, the authors discussed the significance of the current study in a broader perspective relevant to recent structures of mammalian BASS members.

      Reviewer #3 (Public Review)

      The manuscript describes new ligand-bound structures within the larger bile acid sodium symporter family (BASS). This is the primary advance in the manuscript, together with molecular simulations describing how sodium and the bile acids sit in the structure when thermalized. What I think is fairly clear is that the ligands are more stable when the sodiums are present, with a marked reduction in RMSD over the course of repeated trajectories. This would be consistent with a transport model where sodium ions bind first, and then the bile acid binds, followed by a conformational change to another state where the ligands unbind.

      While the authors mention that BASS transporters are thought to undergo an elevator transport mechanisms, this is not tested here. In my reading, all the crystal structures describe the same conformational state, and the simulations do not make an attempt to induce a transition on accessible simulation timescales. Instead, there is a morph between two states where different substrates are bound, which induces a conformational change that looks unrelated to the transport cycle.

      To make our conclusions clearer we have added another movie showing a morph between the structure without substrate (instead of using the structure with taurocholate, which we were using as a representative of the unbound structure) and that with pantoate and have omitted the panel domain including TM1. While both of these structures are inward-facing, there are significant conformational changes within TM4 that we have described in the article.

      Instead, the focus is on what kinds of substrates bind to this transporter, interrogating this with isothermal calorimetry together with mutations. With a Kd in the micromolar range, even the best binder, pantoate, actually isn't a particularly tight binder in the pharmaceutical sense. For a transporter, tight binding is not actually desirable, since the substrate needs to be able to leave after conformational change places it in a position accessible to the other side.

      As the referee points out the Kd that we observe would be consistent with those for substrates of other transporters.

      There is one really important point that readers and authors should be aware of. In Figure 2A, the names are not consistent with the chemical structure. "-ate" denotes when a carboxylic acid is in the deprotonated form, creating a charged carboxylate. What is drawn is pantoic acid, ketopantoic acid, and pantoethenic acid. Less importantly, the wedges and hashes for the methyl group are arguably not appropriate, since the carbon they are attached to is not a chiral center. For the crystallization, this makes no difference, since under near-neutral pKas the carboxylic acid will spontaneously deprotonate, and the carboxylate form will be the most common. However, if the structures in Figure 2A were used for classical molecular simulation, that would be a big problem, since now that would be modeling the much rarer neutral form rather than the charged state. I am reasonably sure based on Figure 5 that the MD correctly modeled the deprotonated form with a carboxylate, but that is inconsistent with Figure 2A. Otherwise, the structure and simulation analysis falls into the mainstream of modern structural biology work.

      We have corrected the inconsistency of the protonaNon state in the naming of the molecular structures. Thank you for poinNng this out – though the names represented the predominant form in soluNon, the more aestheNcally pleasing protonated form got the beOer of us in our representaNons. The correct form was used in the MD.

      Reviewer #1 (Recommendations For The Authors):

      1) Omit maps (Fo-Fc) should be shown for pantoate and for the sodiums in the structure.

      This has been added to supplementary Figure 2.

      2) Line 86 - could you briefly describe the alternative mechanism proposed for the mammalian NTPCs?

      We have added an extra line to describe this deviation from the classical alternating access model.

      3) Line 124 - where is the lipid like molecule, and does it interact with either the kinked helix or the substrate? A supplemental figure would be helpful.

      The lipid like molecule lies between the substrate and the kinked helix, but doesn’t interact strongly with either. It would appear that the lipid would bind in the crevice rather than causing the crevice. We add Author response image 1 here but have not added it to the supplementary figures. The maps and PDB file are available for download.

      Author response image 1.

      The 2mFo-DFc density is at 1σ, the mFo-DFc density is at 2.5σ.

      4) I notice that the apo and pantoate structures are crystallized in different space groups. How does this compare to the original TCH structure? Is there any chance that crystal packing is altering the TM1 geometry or loop 1?

      We cannot rule out the effect of the crystallisation conditions on the movement of the TM1. We have now solved a number of different structures of ASBTNM and this is the first time we observe TM1 in this conformation. As stated above we have refrained from overplaying the significance of the movement of TM1 to transport, other than to say that some adjustments need to be made to accommodate the pantoate.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      Pg 3, "... with a 5-fold inverted repeat...", Should be 2-fold?

      Changed, thank you.

      Reviewer #3 (Recommendations For The Authors):

      Is there any chance that the MD simulations (even in a reduced form) could be uploaded to Zenodo or a similar repository?

      We have taken up this suggestion and added the information in the paper: MD trajectories in the GROMACS XTC format were deposited in the OSF.io repository under DOI 10.17605/OSF.IO/KFDT5 under the open CC-BY Attribution 4.0 International license. The trajectories contain all atoms and were subsampled at 5-ns intervals. GROMACS run input files (TPR format) and initial coordinate files (GRO format) together with topology files (GROMACS format) are also included.

      Watch the "Å" symbol in Figures 5, S6, S7. This looks like they were made in matplotlib, and probably used something like: "$\AA$", which puts the symbol in math mode. This makes the Å symbol in italics. Matplotlib has gotten better UTF-8 support

      Changed, thank you.

      Your citation for LINCS duplicates the citation for PME. I think you want the Hess 1998 paper. 10.1002/(SICI)1096-987X(199709)18%3A12<1463%3A%3AAID-JCC4>3.0.CO%3B2-H

      Changed, thank you

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors performed a meta-analysis of GC concentrations and metabolic rates in birds and mammals. They found close associations for all studies showing a positive association between these two traits. As GCs have been viewed with close links to "stress," authors suggest that this overlooks the importance of metabolism and perhaps GC variation does not relate to "stress" per se but an increase in metabolism instead.

      This is an important meta-analysis, as most researchers acknowledge the link between GCs and metabolism, metabolism is often overlooked in studies. The field of conservation physiology is especially focused on GCs being a "stress" hormone, which overlooks the importance of GCs in mediating energy balance, i.e., an animal that has high GC concentrations may not be doing that poorly compared to an animal with low GC concentrations, it might just be expending more energy, e.g., caring for young. The results, with overwhelming directionality and strong effect sizes, support the link for a positive association with these two variables.

      My main concern lies in that most of the studies come from a few labs, therefore there may be limited data to test this relationship. I would include lab as a random effect to see how strong this effect might be.

      We think this is a good point, and we ran the main models included in the manuscript including Lab as random effect (N= 35 experiments, 21 studies, 16 labs). This did not affect the results, leading to negligible changes in the model parameters (alternative model tables are shown in Author response table 1 and 2). In the revised version of the manuscript we mention that we tested the effect of Lab but did not keep this variable in the models (lines 183-185)

      Author response table 1.

      Meta regression model testing the association between metabolic rate (MR) effect sizes and glucocorticoid effect sizes.

      Author response table 2.

      Meta regression model (quantitative approach) testing the effect of (a) Taxa, (b) Before / after effect, (c) Experiment / control effect, (d) Use of Metabolic Rate or Heart Rate as metabolic variable and (e) Treatment type, on the association between metabolic rate (MR) and glucocorticoid effect sizes across studies.

      Furthermore, I would like to see a test of the directionality of the two variables. Authors suggest that changes in metabolism affect GC levels but likely changes in GC levels would affect metabolism. Why not look into studies that have altered GC levels experimentally and see the effect on metabolism? Based on the close link, authors suggest that GCs may not play a role outside of "stress" beyond the stressor's effect on metabolic rate. However, if they were to investigate manipulations of GCs on metabolic rate, the link may or may not be there, which would be interesting to look at. I firmly believe that GCs are tightly linked to metabolism; however, I also think that GCs have a range of effects outside of metabolism as well, depending on the course and strength of the stressor.

      The directionality of the two variables is indeed a question of interest – we show that changes in metabolic rate affect GCs, but does the reverse also happen? In the schematic model we propose in Box 1, we propose that the effect is uni-directional, i.e. metabolic rate affects GC-levels, but GCs have no direct effect on metabolic rate. We note that there may however be an indirect effect, in that in the absence of a GC-response to an increase in metabolic rate the organism would after some time no longer be able to fuel the metabolic rate. Because we anticipate that more readers may raise this question, we have added the following paragraph to the discussion:

      “We selected studies in which experimental treatments affected MR, leading us to conclude that the most parsimonious explanation of our finding is that GC levels were causally related to MR. Suppose however that instead we reported a correlation between MR and GCs, using for example unmanipulated individuals. The question would then be justified whether changes in GCs affected MR or vice versa. Direct effects of GCs could be studied using pharmacological manipulations. However, while many studies show that GC administration induces a cascade of effects, when the function of GCs is to facilitate a level of MR, as opposed to regulate variation in MR, we do not anticipate such manipulations to induce an increase in MR (Box 1). On the other hand, when MR is experimentally increased in conjunction with pharmacological manipulations that supress the expected GC-increase (an experiment that to our best knowledge has not yet been done), we would predict that the increase in MR can be maintained less well compared to the same MR treatment in the absence of the pharmaceutical manipulation. This result, we would interpret to demonstrate that maintaining a particular level of MR may be dependent on GCs as facilitator, but it would be misleading to interpret this pattern to indicate that GCs regulate MR, as is sometimes proposed. Additionally, it would be informative to investigate whether energy turnover immediately before blood sampling is a predictor of GC levels, as we would predict on the basis of the interpretation of our findings. Increasing the use of devices and techniques that monitor energy expenditure or its proxies (e.g. accelerometers) may be a way to increase our understanding of the generality of the GC-MR association. “

      We based our hypotheses and searching criteria on the assumption that GCs induce physiological processes to help the organism facilitate energetic demands. Pharmacologically induced increases in GCs would lead to physiological responses and associations that we consider not comparable to the ones reported in this work, as we base our hypotheses on natural (i.e. non pharmacologically induced) GC and MR variation. This said, with exogenous GC administration, we may expect GC cascade effects, but not necessarily an increase in MR. Here - and acknowledging that the link between GCs and metabolic rate may entail complex steps - we predict that GC administration may lead to an increase in blood glucose and may affect energy allocation at a tissue-specific level. However, such increase may have no effect on whole-organism energy expenditure, unless energy expenditure is limited by glucose availability. We however acknowledge that it would be interesting to investigate the kind of associations between MR, GCs and other physiological variables (e.g. glucose) that appear when inducing an increase in GCs, as these would broaden our understanding of the mechanistic processes underlying these associations.

      We show that variation in GC levels was explained by variation in MR, independent of the stimulus that caused the increase in MR. We propose that the most parsimonious interpretation of our findings is that GC variation is an indicator of variation in MR, independent of the cause of variation in MR. We do not intend to prove causality when making predictions on the co-dependency of metabolic rate and GCs. In fact, our predictions do not imply that one trait necessarily affects the other per se, as these interplay is likely to be shaped by the environmental or physiological context (Box 1). Thus, the specific mechanisms underlying how changes in metabolic rate induce changes in GCs - or the other way around - need to be investigated. One step to tackle this in upcoming research would indeed be studying the effects of exogenous GCs on metabolic rate.

      In the manuscript, we clarify that GCs have a variety of cascade effects besides metabolism (Box 1). On the basis of our results, however, we suggest that many of the downstream effects of GCs may be interpreted as allocation adjustments to the metabolic level at which organisms operate (lines 235236), but we do acknowledge that these cascade effects are complex and affects many systems besides metabolism.

      This work helps in the thinking that GCs are not the same as a "stress" hormone or labelling hormones with only one function. As hormones are naturally pleiotropic, the view of any one hormone being X is overly simplistic.

      We fully agree, but stress that we focus on how GCs are regulated, which may be less complex than its pleiotropic functions. Indeed, we consider that the many functions of GCs have potentially clouded the question as to how GCs are regulated.

      Reviewer #2 (Public Review):

      Where this study is interesting is that the authors do a meta-analysis of studies in which metabolic rate was experimentally manipulated and both this rate and glucocorticoid levels were simultaneously measured. Unsurprisingly, there are relatively few such studies and many are from the lab of Michael Romero. While the results of the analysis are compelling, they are not surprising. That said, this work is important.

      It is worth noting that in this analysis, the majority of the studies, if not all, are dealing with variation in baseline levels of glucocorticoids. That means the hormone is mostly acting metabolically at these lower levels and not as a stress response hormone as it does when levels are much higher. This difference is probably due to differences in receptors being activated. This could be discussed.

      As mentioned in Box 1, within our hypothesis framework we make no distinction between baseline and stress-induced GC-levels, and thereby in effect assume these to be points in a continuum from a metabolic perspective. Our results support this view, as our sample includes baseline- and stressinduced –range GC values, and these are not distinguishable (Fig. 3). We do however recognize that we did not return to this issue in the Discussion, while the same issue may well occur to many readers familiar with the literature. We therefore added the following paragraph to the discussion:

      “ Note that in the context of our analysis we made no distinction between ‘baseline’ and ‘stressinduced GC-levels (Box 1). Firstly, because these concepts are not operationally well defined – baseline GC-levels are usually no better defined than ‘not stress-induced’. Secondly, when considering the facilitation of metabolic rate as primary driver of GC regulation, there does not appear a need to invoke different classes of GC-levels instead of the more parsimonious treatment as continuum. This is not to say that this also applies to the functional consequences of GC-level variation: it is well known that receptor types differ in sensitivity to GCs (Landys et al. 2006; Sapolsky et al. 2000; Romero 2004), thereby potentially generating step functions in the response to an increase in GC-levels.”

      We note further that to our best knowledge there are no standard or established thresholds that allow us to separate GC levels into “baseline” and “stress-induced”, and in any case these concentration ranges differ strongly among species and experimental set-ups (e.g. captive vs. free-living individuals). Consequently, many of the studies included in our work report what would typically be interpreted as “stress-induced” levels, and thus within the range of those reported by standardized stress protocols (e.g. levels above 20-30 ng/ml for corticosterone in bird species, Cohen et al. 2007, Jimeno et al. 2018; levels between 150-300 ng/ml in captive rats, Buwalda et al. 2012, Beerling et al. 2011; levels 2-10 times above baseline in humans, Sramek et al. 1999). We also want to note that we work with effect sizes, i.e. not GC levels, and that GC measurement units differ among studies. Mean GC values by study in the original units are shown in Table S3.

      Reviewer #1 (Recommendations For The Authors):

      L26: why is the causality in this direction? Not that I don't think that metabolic rate drives GC variation but the meta-analyses here could suggest the opposite direction as well? That GC phenotype could limit or promote metabolic activity? (In terms of the natural variation studies and not the experimental ones)

      See our detailed response above, on the directionality of the association and the hypotheses underlying our searching criteria and the paragraph on this topic added to the discussion.

      L27: again, I am not sure the meta-analyses can lead to this question. Although there is a tight link between GC and metabolic rate, there is still variation around that is unexplained.

      See our detailed response above, on the directionality of the association and the hypotheses underlying our searching criteria and the paragraph on this topic added to the discussion.

      L45: I think there is plenty of literature in the field that would say that GCs are linked to metabolism and don't define GCs as synonymous with stress. See MacDougall and others that you cite later in the paragraph: "GCs and stress are not synonymous." I think maybe shifting the strong language at the beginning might help with your argument later on.

      We do not disagree, but two considerations made us retain the ‘strong language’. Firstly, while many authors mention links between GCs and metabolic rate, as we read the literature, the quantitative importance of this link to understand GC variation is underestimated in our view. Secondly, the literature is rife with articles that clearly do not consider metabolic rate variation as a driver of the GC variation they observe.

      Box 1: on the diagram the link between GCs and learning is problematic as there are plenty of studies that show a negative effect on learning with GC exposure. It usually depends on the time course of GCs and learning outcomes.

      We agree with the referee´s point. Learning was deleted from the diagram to avoid confusion.

      The diagram also suggests that GCs in the blood decreases insulin. For Aves that are rather insulin insensitive, the evidence that GCs affect insulin concentrations are very limited, even in the poultry literature.

      Indeed, and we now mention in box 1 that GC effects on insulin are primarily found in mammals, and less so in birds.

      Box 1 at the end also makes a point about GCs having complex downstream effects at baseline and stressinduced levels, besides energy mobilization but the abstract seems to indicate that there are limited effects of GCs outside of metabolism. Hence why I also advocate being careful about the wording in the abstract.

      The related abstract sentence has been rewritten to avoid this inconsistency (lines 17-18)

      L107: "being or not significant" meaning significant or not? The wording is awkward

      We reworded the sentence for clarity. We included studies reporting both significant and nonsignificant increases in metabolic rate.

      L110: why not look at whether experimental increases in GCs also induce increases in metabolic rate, i.e., the directionality of the two variables. (point 2)

      See our detailed response above, on the directionality of the association and the hypotheses underlying our searching criteria and the paragraph on this topic added to the discussion.

      The studies, although there are ~30, are overlapping in terms of labs, i.e., a lot of them came from the same lab. Did you think to include lab as a random effect to see if there are effects of one or two labs doing work that strengthened the results?

      We think this is a good point, and we ran the main models included in the manuscript including Lab as random effect (N= 35 experiments, 21 studies, 16 labs). Including Lab as random factor did not affect the results, leading to negligible changes in the model parameters. We provide tables with the model results in our previous response. In the text we now mention that we tested the effect of Lab but did not keep this variable in the models (lines 183-185)

      L314: I think it depends on the time course and intensity of the stressor. I firmly believe that outside of metabolic demands, high levels of GCs chronically or the inability to mount a proper stress response is indicative of pathology or something outside of metabolism.

      Whether the association between GCs and MR holds under a context of ‘chronic stress’ (i.e. understood as chronically elevated GCs) remains to be tested. We note, however, that chronically high levels of metabolic rate may potentially have pathological effects.

      Reviewer #2 (Recommendations For The Authors):

      I find the title a bit misleading. The conclusion from the study is that glucocorticoid levels can reflect metabolic rate, not that glucocorticoid levels do not indicate stress. Remember, stress can certainly affect metabolic rate.

      We see the point but note that other drivers of variation in metabolic rate also increase GCs, as we show in our analysis, and hence we propose that GC variation always indicate variation metabolic rate, and only stress when stress is the cause of the increase in metabolic rate.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their insightful and detailed analysis of our work, in particular to reviewer 2. We also would like to thank the Elife editorial team for organizing this form of public review and debate, which we believe will be of interest to the science community.

      Reviewer #1 (Public Review):

      Despite durable viral suppression by antiretroviral therapy (ART), HIV-1 persists in cellular reservoirs in vivo. The viral reservoir in circulating memory T cells has been well characterized, in part due to the ability to safely obtain blood via peripheral phlebotomy from people living with HIV-1 infection (PWH). Tissue reservoirs in PWH are more difficult to sample and are less well understood. Sun and colleagues describe isolation and genetic characterization of HIV-1 reservoirs from a variety of tissues including the central nervous system (CNS) obtained from three recently deceased individuals at autopsy. They identified clonally expanded proviruses in the CNS in all three individuals.

      Strengths of the work include the study of human tissues that are under-studied and difficult to access, and the sophisticated near-full length sequencing technique that allows for inferences about genetic intactness and clonality of proviruses. The small sample size (n=3) is a drawback. Furthermore, two individuals were on ART for just one year at the time of autopsy and had T cells compatible with AIDS, and one of these individuals had a low-level detectable viral load (Figure S1). This makes generalizability of these results to PWH who have been on ART for years or decades and have achieved durable viral suppression and immune reconstitution difficult.

      While anatomic tissue compartment and CNS region accompany these PCR results, it is unclear which cell types these viruses persist in. As the authors point out, it is possible that these reservoir cells might have been infiltrating T cells from blood present at the time of autopsy tissue sampling. Cell type identification would greatly enhance the impact of this work. Several other groups have undergone similar studies (with similar results) using autopsy samples (links below). These studies included more individuals, but did not make use of the near-full length sequencing described here. In particular, the Last Gift cohort, based at UCSD and led by Sara Gianella and Davey Smith, has established protocols for tissue sampling during autopsy performed soon after death. https://pubmed.ncbi.nlm.nih.gov/35867351/ https://pubmed.ncbi.nlm.nih.gov/37184401/

      We agree with reviewer 1 that studies to identify specific cell types that harbor intact HIV-1 in individual tissue compartments would be very informative; our group has recently initiated such studies.

      Overall, this small, thoughtful study contributes to our understanding of the tissue distribution of persistent HIV-1, and informs the ongoing search for viral eradication.

      We thank reviewer 1 for these encouraging remarks.

      Reviewer #2 (Public Review):

      The manuscript by Sun et al. applies the powerful technology of profiling viral DNA sequences in numerous anatomical sites in autopsy samples from participants who maintained their antiviral therapy up to the time of death. The sequencing is of high quality in using end-point dilution PCR to generate individual viral genomes. There is a thoughtful discussion, although there are points that we disagree with. This is an important data set that increases the scope of how the field thinks about the latent reservoir with a new look at the potential of a reservoir within the CNS.

      We greatly appreciate the comments by reviewer 2 and would like to thank them for their detailed and very knowledgeable analysis of this paper.

      1) The participants are very different in their exposure to HIV replication and disease progression. Participant 1 appears to have been on ART for most of the time after diagnosis of infection (16 years) and died with a high CD4 T cell count. The other two participants had only one year on ART and died with relatively low CD4 T cell counts (under 200). This could lead to differences in the nature of the reservoir. In this regard, the amount of DNA per million cells appears to be about 10-fold lower across the compartments sampled for participant 1. Also, one might expect fewer intact proviruses surviving after 16 years on ART compared to only 1 year on ART. The depth of sampling may be too limited and the number of participants too few to assess if these differences are features of these participants because of their different exposures to HIV replication. On the positive side, finding similarities across these big differences in participant profiles does reinforce the generalizability of the observations.

      Many thanks for pointing this out. We also noticed that the total number of HIV-1 proviruses is smaller in our study participant 1 (who had been on ART for 16 years), compared to study persons 2 and 3 with more limited treatment durations (1-2 years), however, due to the small number of study persons, we think we cannot use these results for inferring how treatment duration influences viral reservoir size in tissues.

      2) The following analysis will be limited by sampling depth but where possible it would be interesting to compare the ratio of intact to defective DNA. A sanctuary might allow greater persistence of cells with intact viral DNA even without viral replication (i.e. reduced immune surveillance). Detecting one or two intact proviruses in a tissue sample does not lend itself to a level of precision to address this question, but statistical tests could be applied to infer when there is sampling of 5 or more intact proviruses to determine if their frequency as a ratio of total DNA in different anatomical sites is similar or different. This would allow adjustment for the different amount of viral DNA in different compartments while addressing the question of the frequency of intact versus defective proviruses. One complication in this analysis is if there was clonal expansion of a cell with an intact genome which would represent a fortuitous overrepresentation intact genomes in that compartment.

      We have performed the analysis suggested by reviewer 2 and included a diagram reflecting the ratio of intact/defective proviruses as a new supplemental figure (Figure S2). Unfortunately, we do not feel comfortable to draw any real conclusions from this additional analysis; the sample sizes are simply too limited.

      3) The key point of this work is that the participants were on therapy up to the time of death ("enforcing" viral latency). The predominance of defective genomes is consistent with this assumption. Is there data from untreated infections to compare to as a signature of whether the viral DNA population was under selective pressure from therapy or not? Presumably untreated infections contain more intact DNA relative to total DNA. This would represent independent evidence that therapy was in place.

      We agree that an analysis of autopsy samples from untreated persons living with HIV-1 would be of great interest, and are actively collaborating with neuropathologists from multiple sites to obtain such samples. Yet, we are not convinced that selection pressure on reservoir cells during ART can be appropriately identified through quantitative virological assays. Rather, we feel that the selection of proviruses can be best assessed when qualitative parameters, including proviral integration sites and their position relative to host epigenetic chromatin features, are evaluated.

      4) There are several points in Figure 5 to raise about V3 loop sequences. The analysis includes a large number of "undetermined" sequences that did not have a V3 loop sequence to evaluate. We would argue it is a fair assumption that the deleted proviruses have the same distribution of X4 and R5 sequences as the ones that have a V3 sequence to evaluate. In this view it would be possible to exclude the sequences for which there is no data and just look at the ratio of X4 and R5 in the different compartments, specifically does this ratio change in a statistically significant way in different compartments? The authors use "CCR5 and non-CCR5" as the two entry phenotypes. The evidence is pretty strong that the "other" coreceptor the virus routinely uses is CXCR4, and G2P is providing the FPR for X4 viruses. Perhaps the authors are trying to create some space for other coreceptors on microglia, but we are pretty sure what they are measuring is X4 viruses, especially in this late disease state of participant 2. Finally, we have previously observed that the G2P FPR score of <2 is a strong indicator of being X4, FPR scores between 2 and 10 have a 50% chance of being X4, and FPR scores above 10 are reliably R5 (PMID27226378). In addition, we observed that X4 viruses form distinct phylogenetic lineages. The authors might consider these features of X4 viruses in the evaluation of their sequences. Specifically, it would be helpful to incorporate the FPR scores of the reported X4 viruses.

      Many thanks for these thoughts. We have now included FPR scores for all sequences and considered sequences with FPR score <2 as X4-tropic. Among 497 proviral sequences derived from all three participants, only 14 proviral sequences had FPR scores between 2 and 10 and their tropism was classified as CCR5 in the new Figure 5. We agree that viral tropism analysis of proviral sequences from the CNS would be of particular interest for study subject 2; however, most brain-derived sequences from that person had large deletions in the env region, precluding an analysis of viral tropism.

      5) We have puzzled over the many reports of different cell types in the CNS being infected. When we examined these cell types (both as primary cells and as iPSC-derived cells), all cells could be infected with a version of HIV that had the promiscuous VSV-G protein on the virus surface as a pseudotype. However, only macrophages and microglia could be infected using the HIV Env protein, and then only if it was the M-tropic version and not the T-tropic version (PMID35975998). RNAseq analysis was consistent with this biological readout in that only macrophages and microglia expressed CD4, neurons and astrocytes do not. From the virology point of view, astrocytes are no more infectable than neurons.

      We appreciate these comments. As described in our discussion, we agree that the role of astrocytes as target cells for HIV-1 infection is highly controversial; we look forward to future opportunities to evaluate HIV sequences in sorted astrocytes from autopsy tissues.

      6) The brain gets exposed to virus from the earliest stages of infection but this is not synonymous with viral replication. Most of the time there is virus in the CSF but it is present at 1-10% of the level of viral load in the blood and phylogenetically it looks like the virus in the blood, most consistent with trafficking T cells, some of which are infected (PMID25811757). The fact that the virus in the blood is almost always T cell-tropic in needing a high density of CD4 for entry makes it unlikely that monocytes are infected (with their low density of CD4) and thus are not the source of virus found in the CNS. It seems much more likely that infected T cells are the "Trojan Horse" carrying virus into the CNS.

      We appreciate the reviewer’s referral to Greek mythology and agree that the hypothesis of infected T cells acting as “Trojan horses” is more intuitive and better supported by available data. We have adjusted our discussion accordingly.

      7) While all participants were taking antiretroviral therapy at the time of their death, they were not all suppressed when the tissues were collected. The authors are careful not to mention "suppressive ART" in the text, which is appreciated. However, the title should be changed to also reflect this fact.

      Thanks for pointing this out. From our perspective, ART is never fully suppressive, as low-level viremia (below the detection threshold of commercial PCR assays) is detectable in almost all ART-treated persons. As such, it is not clear to us that “suppressive” necessarily implies suppression below the detection limits of commercial PCRs. We argue that ART can also be suppressive when plasma viral loads are in the range of 100 copies/ml, as they are in our study subject 3. Nevertheless, we have changed the title to avoid confusion.

      Reviewer #1 (Recommendations For The Authors):

      I encourage the authors to compare their autopsy and tissue sampling procedures to those used by The Last Gift researchers and consider including references to this ongoing study. If the authors plan to continue in this line of research, the field would greatly benefit from a collaboration that would bring together their excellent and advanced PCR technique with the larger sample size offered by The Last Gift. Lastly, is there some way to simultaneously determine cell type when NFL sequencing is performed?

      We look forward to collaborating with investigators from the Last Gift Cohort in the future and have integrated additional references in the manuscript to acknowledge their work. At the current stage of technology development, we think that sorting of infected cells based on canonical markers of defined cell populations is the preferred approach for identifying phenotypic properties of infected cells; however, expansion of the PheP-Seq assay (Sun et al., Nature 2023), may facilitate this process in the future.

      Reviewer #2 (Recommendations For The Authors):

      1) The authors have chosen to lump all R5 viruses together in terms of their entry phenotype, giving all viruses an equal chance of infecting all potentially susceptible cell types. This ignores the fact that normal HIV is selected to infect cells, requiring a high density of CD4 as is found on T cells. We use the term R5 T cell-tropic to describe "normal" HIV. The ability to efficiently enter cells that have a low density of CD4, such as macrophages and microglia, involves the evolution of a distinct phenotype, termed macrophage tropism (PMID24307580, and work of others). This happens most often in the CNS where T cells are infrequent thus potentiating evolution to infect an alternative cell type. This change in entry phenotype is dramatic and, like X4 viruses, results in phylogentically distinct lineages (PMID22007152). There are no sequence signatures for M-tropic viruses as there are for X4 viruses, but the fact that there are sequences shared between the CNS and lymphoid tissue makes it much more likely that there are T cells migrating around the body, including into the CNS, that are carrying R5 T cell-tropic virus with them, with the cells potentially clonally expanding in situ in the CNS. The persistence of a potential CNS T cell reservoir was the point we were trying to make in our recent paper (ref. 38), not only that these CSF rebound viruses were R5 viruses but they were selected for replication in T cells as seen by their dependence of a high density of CD4 for entry. This is the conclusion one would reach if clonally expanded viral sequences were shared between two lymphoid compartments. It is not necessary to ascribe properties of infection and clonal amplification to microglia cells when a more parsimonious explanation is that there are low levels of T cells in the CNS, especially in the absence of entry phenotype data showing these sequences encode an M-tropic entry phenotype. As is the authors are just adding to the unproven belief that virus in the CNS must be in myeloid cells, which in this case in particular we suspect is the wrong interpretation.

      We are impressed by reviewer 2’s recent work, suggesting the viral reservoir in the CNS may primarily consist of clonally-expanded R5 T-cell tropic viruses. We have adjusted our discussion to emphasize this possibility, and to highlight that viral entry phenotyping data will be informative for better understanding viral persistence in the brain.

      2) The authors noted that the frequency of intact proviruses is highest in the lymph nodes of 2/2 participants for which they had lymph node samples, relative to the other tissues examined. They thus conclude, "Together, these results indicate that intact HIV-1 proviruses are preferentially detected in lymphoid and gastrointestinal (GI) tissues." However, an examination of Figure 2 reveals that the total HIV copy number is highest in the lymph nodes of these two people. Thus, it doesn't seem like HIV is preferentially intact in the lymph nodes as much as they sampled more provirus from that tissue and therefore were able to detect more intact proviruses.

      We have adjusted our manuscript to indicate that the highest numbers of intact HIV-1 proviruses were present in lymph nodes, both in terms of absolute numbers and after normalization to the total numbers of cells analyzed.

      3) In Figure 1A, the legend should be changed so that "PMSC" is spelled out as "premature stop codon" for ease of reading. This is done for Figure 1B.

      We have corrected this issue as suggested by the reviewer.

      4) The pie charts in Figure 5 could be better labeled for ease of interpreting. In Figure 5C, instead of just labeling it as "P2" it could be "Distribution of CXCR4-using proviruses, P2", as an example. As it stands, it is hard to know what the figure is describing without reading the text.

      We have changed this accordingly.

      5) While all participants were taking antiretroviral therapy at the time of their death, they were not all suppressed when the tissues were collected. The authors are careful not to mention "suppressive ART" in the text, which is appreciated. However, the title should be changed to also reflect this fact.

      Thanks for pointing this out. From our perspective, ART is never fully suppressive, as low-level viremia (below the detection threshold of commercial PCR assays) is detectable in almost all ART-treated persons. As such, it is not clear to us that “suppressive” necessarily implies suppression below the detection limits of commercial PCRs. We argue that ART can also be suppressive when plasma viral loads are in the range of 100 copies/ml. Nevertheless, we have changed the title to avoid confusion.

      Editorial comments:

      In addition to the reviewers suggestion, we feel that adding more information on how you define intact proviral sequence, e.g. are only disrupted essential genes or also in accessory genes considered? Previous studies have shown that brain-derived HIV-1 strains are usually CCR5-tropic, show high affinity for the CD4 receptor and frequently contain defective vpu genes. Some information and discussion if the brainderived sequences confirm these previous finding seems of significant interest.

      As described in our previous work (e. g. Lee et al, JCI 2017; Jiang et al, Nature 2020), accessory genes are not considered in our definition of “genome intactness”; this is consistent with approaches other investigators have chosen (e. g. Hiener et al, Cell Reports 2017). Within the genome intact sequences we identified in the CNS in our study persons, we found no evidence for deletions of vpu sequences; this has been emphasized in the revised manuscript.

    1. Author Response

      We thank the reviewers and editors for their deep, thoughtful and constructive assessment of our manuscript. We nevertheless would like to reply to the Reviewers reports.

      Reviewer #1.

      (...) The data can be well described by three components involving a closed state and two open states O1 and O2, in which the second component O2 is the one affected by the mutations and deletions

      This statement is not completely clear to us. What we propose is that O1 is not visible in WT, only in the mutants. What would be affected is the access to O1 and the transition between O1 and O2, but not O2 itself.

      From the beginning, it becomes challenging for non-experts to grasp the structural basis of the perturbations that are introduced (ΔPASCap and E600R), because no structural data or schematic cartoons are provided to illustrate the rationale for those deletions or their potential mechanistic effects. In addition, the lack of additional structural information or illustrations, and a somewhat confusing discussion of the structural data, make it challenging for a reader to reconcile the experimental data and mathematical model with a particular structural mechanism for gating, limiting the impact of the work.

      Thank you very much for pointing this out and our apologies for the missing cartoon. It will be provided in the revised version.

      There are several concerns associated with the analysis and interpretations that are provided. First, the conductance-voltage (G-V) relations for the mutants do not seem to saturate, and the absolute open probability is not quantified for any mutant under any condition. This makes it impossible to quantitatively compare the relative amplitudes of the two components because the amplitude of the second component remains undetermined. […] This reduces confidence in the parameters associated with G-V relations, as the shape and position of both components might change significantly if longer pulses were used.

      We agree that the endpoint of activation is ill-defined in the cases where a steady-state is not reached. This does indeed hamper quantitative statements about the relative amplitude of the two components. However, while the overall shape does change, its position (voltage dependence) would not be affected by this shortcoming. The data therefore supports the claim of the “existence of mutant-specific O1 and its equal voltage dependence across mutants.”

      Further, because the mutant channel currents do not saturate at the most positive potentials and time intervals examined, the kinetic characterization based on reaching 80% of the maximum seems inappropriate, because the 100% mark is arbitrary.

      We agree that the assessment of kinetics by a t80% is not ideal. We originally refrained from exponential fits because they introduce other issues when used for processes that are not truly exponential (as is the case here). To address the concerns, we will add time constants from these fits in the revised version. Please note that in Figure 3, we do provide time constants, and they support the statement made.

      Further, the kinetics for some of the other examined mutants (e.g. those in Fig. 2A) are not shown, making it difficult to assess the extent to which the data could be affected by having been measured before full equilibration.

      This seems to be a misunderstanding. ∆2-10 kinetics is shown in Fig. 2c. ∆-eag is shown in Fig. 3. We will make sure to state this explicitly in the revised version.

      For example, I would expect that the enhanced current amplitudes from Figure 5 are only transient, ultimately reaching a smaller steady-state current magnitude that depends only on the stimulation voltage and is independent of the pre-pulse. The entire time course including the rise-time and decay is not examined experimentally. This raises concern on whether occupancy of state O1 might be overestimated under some experimental conditions if a fraction of the occupancy is only transient. The mathematical model is not utilized to examine some of these slower relaxations - this may be because the model does not reproduce these slow processes, which would represent a serious shortcoming given that the slow kinetics appear to be intrinsic to transitions around state O1.

      Thank you for thinking so deeply about the problem. We identified the same questions and did explore them using the model (Figure 8 c). Your intuition is confirmed there, the slow kinetics leads to a decrease of O1 occupancy after a transient accumulation. We intend to study this experimentally as well in the revised version.

      The significance of the results with the Δ2-10.L341Split is unclear. First, structural as well as functional data has established that the coupling of the voltage sensor and pore does not entirely rely on the S4-S5 linker, and thus the Split construct could still retain coupling through other mechanisms, which is consistent with the prominent voltage dependence that is observed. If both state O1 and O2 require voltage sensor activation, it is unclear why the Split construct would affect state O1 primarily, as suggested in the manuscript, as opposed to decreasing occupancy of both open states.

      Thank you for pointing out the unclear nature of our arguments. We rephrase in the following and will do so in the revised document: If, in non-split mutants, the upward transition of S4 allows entry to O1, it is reasonable to assume that the movement is not transmitted the same way in the split and the transition into O1 is less probable. The observation that, in the split, entry into O1 requires higher depolarization and appears to be less likely, suggests that downstream of S4 (beyond position 342), there is a mechanism to convey S4 motion to the gate of the mutants.

      The figure legends and text do not describe which solutions exactly were utilized for each experiment, [...] Because no zero-current levels are shown on the current traces, it becomes very hard to determine which voltages correspond to each of the currents (see Fig. 1A).

      Will be corrected.

      … the rationale for choosing some solutions over others is not properly explained. […] The reversal potential for solutions used to measure voltage-activation curves falls right at the spot where occupancy of the first component peaks (e.g. see Figure 1B). […] It is unclear whether any artifacts could have been introduced to the mutant activation curves at voltages close to the reversal potential.

      The high potassium extracellular solution was chosen to obtain tail currents of sufficient size, warranting precise determination of the reversal potential for every individual experiment. In this way, we ensured that there were no artifacts introduced to the activation curves. Tail currents were used when closing was reasonably fast (∆PASCapL322H and E600RL322H), but otherwise, we used the amplitude at the end of the pulse to get the reversal potential.

      One key assumption that is not well-supported by the data pertains to the difference in single-channel conductance between states O1 and O2 - no analysis or discussion is provided on whether the data could also be well described by an alternative model in which O1 and O2 have the same conductance. No additional experimental evidence is provided related to the difference in conductance, which represents a key aspect of the mathematical model utilized to interpret the data.

      We agree that the relative conductance of O1 and O2 is a key point. Our proposal mainly stems from the data presented in Fig. 4 and the amplitudes of the two components of the tail at potentials where both states are visible. We also agree that whole cell currents represent a product of occupancy and conductance and that only single channel recordings can produce unambiguous proof for the higher conductance of O1. We have embarked on a series of experiments directly addressing this in the mutants that will be reported in the revised version. Still, we did explore this issue with the model. Following the path of the least number of assumptions, we initially tested models with equal conductance for both states. None of these models was able to reproduce the shape of the tails and the prepulse-dependent increase.

      The CaM experiments are potentially very interesting and could have wide physiological relevance. However, the approach utilized to activate CaM is indirect and could result in additional non-specific effects on the oocytes that could affect the results.

      Thank you for the appreciative comments about the relevance of our results. We are aware of the potential side effects of the use of thapsigargin and ionomycin, but we still used this approach as an established method to raise intracellular Ca2+. This said, we would like to point out that the effects of Ca2+ increase on channel behavior do revert with a time course that mirrors the estimated time course of Ca2+ itself (supplement 1 to figure 7), suggesting that we are monitoring a Ca2+-dependent event.

      The description of the mathematical model that is provided is difficult to follow, and some key aspects are left unclear, such as the precise states from which state O1 can be accessed, and whether there is any direct connectivity between states O1 and O2 - different portions of the text appear to give contradictory information regarding these points.

      This seems to be a misunderstanding: supplement 1 to figure 8 graphically details the model’s layout and explicitly shows the connections to the two open states. It also shows that these are not connected. We will make sure that the text is more clearly stating this fact. We did explore models with one open state connected to more than one other state (loops) and found that none of these models can reproduce the large range of depolarizations for with conductance is reduced as compared to lower and higher depolarization (Figure 1).

      Several rate constants other than those explicitly mentioned to represent voltage sensor activation are also assigned a voltage dependence - the mechanistic basis of that voltage dependence is unclear.

      Some fundamental properties we observed in the mutants can be explained with constant, voltage-independent rate constants into and out of both open states. Specifically, it was possible to achieve behavior very close to that displayed in Figure 8c with constant η, θ, ε, and ζ. We then attempted to also reproduce the strong prepulse-dependence (Figure 6A and B) and found that we needed additional degrees of freedom to incorporate both behaviors with one parameter set. We could either add more states, and thereby rates, or introduce voltage dependence to η and θ. With already 32 states and 10 rates, we decided to adopt the less complex model variant. We agree that this probably reduced the interpretability of the model. As a rule, a transition with a voltage-dependence of the functional form of Eq.1 corresponds to the kinetic properties of two or three transitions, where one is voltage-independent (setting the maximal rate) and one has the classical exponential shape expected from truly molecular transitions.

      We also agree that, conceptually, the transitions between the two layers – tentatively associated with a transition in the ring structure– should be voltage-independent. Interestingly, their voltage dependence is very similar to the voltage dependence of the early activation, i.e. centered at -100 and -120mV, similar to β. We therefore attempted to replace the voltage dependence of κ and λ with a state-dependence. To this end, we introduced a parameter that modified κ and λ depending on the state’s position along the α-β axis. While it seemed possible to include all desired features in a model with state-dependent κ and λ, it proved extremely difficult to tune the parameters. Eventually, we reverted to purely voltage-dependent and not state-dependent transition rates κ and λ. Nevertheless, we believe that their voltage dependence could be replaced by some form of state-dependence, i.e. by rates κ and λ that change systematically from the left-hand side of the scheme to its right-hand side.

      Finally, a clear mechanistic explanation for the full range of effects that the ΔPASCap and E600R mutants have on channel function is lacking, as well as a detailed description of how those newly uncovered transitions would influence the activity of the WT channel.

      We agree. Ultimate mechanistic explanations will have to await data from protein structures of intermediate states and in particular the mutant-specific open state.

      …as well as a detailed description of how those newly uncovered transitions would influence the activity of the WT channel; this latter point is important when considering whether the findings in the manuscript advance our understanding of the gating mechanism of Kv10 channels in general, or are specific to the particular mutants that are studied.

      We still do not know if the transitions to O1 are identical in the mutants and WT, although our data opens the path to dissecting the interplay of intracellular domains and voltage sensor. We think that the results are relevant for KCNH channels in general because we have made visible otherwise invisible states.

      It is unclear, for example, how both the mutation or the deletion at the cytoplasmic gating ring enable conduction by state O1, especially when considering the hypothesis put forward in this study that transition to O1 exclusively involves transitions by the voltage sensor and not the cytoplasmic gating ring.

      The transition to O1 is in our model made possible by a displacement of the voltage sensor. In our view, when this occurs with a properly folded and positioned intracellular ring, permeation (access to O1) is precluded. It is precisely the distortion in the intracellular ring induced by mutation or deletion what allows access to O1.

      It is also not clearly described whether a non-conducting state with the equivalent state-connectivity as O1 can be accessed in WT channels, or if a state like O1 can only be accessed in the mutant channels. Importantly, if a non-conducting state with the same connectivity to O1 were to be accessed in WT channels, it would be expected that an alternating pulse protocol as in Fig. 4 would result in progressively decreasing currents as the occupancy of the non-conducting state equivalent to O1 is increased. Because this is not the case, it means that mutation and deletion cause additional perturbations on the gating energetics relative to WT, which are not clearly fleshed out.

      Thank you for highlighting this important question. Following the arguments in the answer to the previous comment, our experiments cannot provide proof for the existence or accessibility of O1 in WT channels. We favor the interpretation that it is not accessible, because, as you point out, this is supported by the outcome of the alternating pulse on WT (figure 4A) and the paradoxical effect of CaM activation. However, this interpretation hinges on the hypothesis that the kinetics of entry into and departure from O1 would be the same in WT channels, as it is in the mutants. Because transitions into a non-conducting O1 would be only indirectly observable in the WT channel, this assumption would be extremely difficult to test.

      Reviewer #2.

      WT EAG currents are far right shifted compared to previously published data. It is not clear whether it is the recording conditions but at 0 mV very few channels are open. Compare this with recordings reported previously of the same channel hEAG1 by Gail Robertson's lab (Zhao et. al. (2017) JGP). In that case, most of the channels are open at 0 mV. There must be at least 25 mV shift in voltage-dependence. These differences are unusually large.

      G-V curves presented in the literature show a large variability. Depending on the conditions, reported V1/2 values in Xenopus oocytes range from -43 mV (Schönherr et al., 2002 DOI: 10.1016/s0014-5793(02)02365-7) to +16 mV (Lörinczi et al, 2015 DOI: 10.1038/ncomms7672) through +4.1 mV (Lörinczi et al., 2016 DOI: 10.1074/jbc.M116.733576), or +10 mV (in the IUPHAR database). The results in the current manuscript are not significantly different from our previously published results on WT channels. In the report the reviewer is referring to, one source of the difference could be that Zhao et al. had no independent information about the reversal potential. In our experiments, we used solutions with high [K]ext. This places the reversal potential in a voltage range within measurable eag currents and thus allows direct determination of the reversal potential, together with the slow kinetics of the tails and the negative shift in the activation. We would argue that this makes the G-V curves less prone to assumptions, albeit for the price of large error bars around the reversal potential. Additionally, the presence of Mg2+ in the extracellular solutions can change the apparent V1/2 depending on the stimulation protocol.

      In most of the mutants, O2 state becomes more prevalent at potentials above +50 mV. At these potentials, endogenous voltage-dependent currents are often observed in xenopus oocytes. The observed differences between the various mutants might simply be a function of the expression level of the channel versus endogenous currents.

      Because we were aware of the potential issue of endogenous chloride currents in oocytes, we included data recorded in chloride-free solutions. Those show comparable results, and thus we conclude that endogenous currents are not the origin of the differences between mutants. We will clarify which solutions were used in the figure legends of the revised version and also include the argument against sizable endogenous current contributions in the revision. In a separate line of experiments, we expressed some of the mutants in HEK cells. Despite small current amplitudes, we were able to replicate the findings of two components, providing oocyte-independent evidence for the existence of a second open state.

      Voltage-dependence of the kinetics of WT currents appears a bit strange. Why is the voltage-dependence saturated at 0 mV even though very few channels have activated at that point? I cannot imagine any kinetic model that can lead to such unusual voltage-dependence of kinetics.

      The fact that voltage dependence of open probability and voltage dependence of activation time constant do not align reflects the multi-state nature of the underlying gating scheme. More than one of several sequential transitions limit the overall kinetics. In this case, the apparent kinetics can reflect a different “bottleneck” transition at different voltage ranges.

      One of the other concerns I have is that in many cases, it is clear that the pulse is too short to measure steady-state voltage-dependence. For instance, the currents in -160 mV and -100 mV in Figure 6A and 6B are not saturated.

      While we agree that steady-state curves can simplify quantitative evaluation – especially the normalization applied in the I/Imax curves in figure 6 – the conclusion of two components is independent of the absolute amplitude under steady state. The fact that in the raw current traces in Figure 6A, after a -160V prepulse, the same current amplitude is reached for two depolarizations (60 and 90 mV) but not for the intermediate depolarization, can only be explained by an I-V curve that has a minimum. Therefore, the raw data directly support the evidence of finding two components, even if the subsequent analysis is affected by insufficient test pulse durations.

      Reviewer #3

      Although very well established, the experimental conditions used in the present manuscript introduce uncertainties, weakening their conclusions and complicating the interpretation of the results. The authors performed most of their functional studies in Cl-based solutions that can become a non-trivial issue when the range of voltages explored extends to very depolarizing potentials such as +120mV. Oocytes endogenously express Ca2+-activated Cl- channels that will rectify Cl- at very depolarizing potentials -due to an increase in the driving force- and contribute dramatically to the current's amplitude observed at the test pulse in the voltage ranges where the authors identify the second open state.

      As stated above, because we were aware of the potential issue of endogenous chloride currents in oocytes, we performed many of the experiments in chloride-free solutions. We conclude that endogenous currents are not the origin of the differences between mutants because the results were comparable regardless of the presence of chloride. We will clarify which solutions were used in the figure legends of the revised version and also include the argument against sizable endogenous current contributions in the revision. In a separate line of experiments, we expressed some of the mutants in HEK cells. Despite small current amplitudes, we were able to replicate the findings of two components, providing oocyte-independent evidence for the existence of a second open state.

      The authors propose a two-layer Markov model with two open states approximating their results. However, the results obtained with the mutants suggest an inactivated state accessible from closed states and a change in the equilibrium between the close/inactivated/open states that could also explain the observed results; therefore, other models could approximate their data.

      In the process of model development, we tested a large number of configurations. Those included models with a single open state which we connected to two closed (or inactivated) states that were not directly connected to each other and populated at different voltage ranges. In doing so, we attempted to allow access to the single open state from different regions of the “state-space”, reflecting the two voltage ranges of high conductance. However, in our hands, such a “loop” in the state-space inadvertently leads to a weak separation of the two states and a weak effect of prepulse potentials. The underlying reason is that given the short activation and deactivation time constants, a single open state in a loop provides an effective short-cut, linking otherwise separated parts of the state-space. To achieve the clear separation of the two component’s voltage dependence, two open states that are not connected to each other were essential. As we wrote in response to other comments above, the ultimate proof of two different open states cannot come from modeling, but from single channel measurements.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In their manuscript, Brischigliaro et al. show that the disruption of respiratory complex assembly results in Drosophila melanogaster results in the accumulation of respiratory supercomplexes. Further, they show that the change in the supercomplex abundance does not impact respiratory function suggesting that the main role of supercomplex formation is structural. Overall, the manuscript is well written and the results and conclusion are supported. The D. melanogaster system, in which the abundance of supercomplexes can be altered through the genetic disruption of the assembly of the individual complexes, will be important for the field to discover the role of the supercomplexes. This manuscript will be of broad interest to the field of mitochondrial bioenergetics. The findings are valuable and the evidence is convincing.

      Strengths

      The system developed in which the relative levels of SCs can be varied will be extremely useful for studying SC physiology.

      The experiments are clearly described and interpreted.

      Weaknesses

      The statement in the abstract regarding low amounts of SCs in "insect tissues" needs further support or should be narrowed. I am only aware of detailed characterization of the mitochondrial SC composition from D. melanogaster, which is insufficient to make a broad statement about the large and diverse category of insects. This should be rewritten.

      Thank you for the comment. We have amended the text accordingly.

      In the introduction (line 76) and discussion (line 283), the authors reference the CoQ binding sites in CI and CIII2 being "too far apart" to allow for substrate channeling. The distance between the active sites, though significant, is insufficient to rule out substrate channeling. A stronger argument arises from the fact that the CoQ sites of both CI and CIII2 are open to the membrane and that there are no clear barriers for the free exchange of CoQ with the membrane pool.

      Thank you for the comment. We have modified both sentences accordingly.

      Line 195, the slight elevation in CI amounts referred to here, does not appear to be statistically significant and therefore should not be discussed a being altered relative to the control.

      To address this point of criticism we have revisited the statistical analysis, originally done by 2-way ANOVA and post-hoc test. After giving it some thought, we now consider that this might not have been the correct way to analyze either the mitochondrial respiratory chain (MRC) activity data or the densitometric quantifications. We have now used unpaired two-tailed Student’s t-test to compare the pairs of either KO or KD vs CTRL. The reason is that since the measurement of each individual MRC activity is actually an independent assay, it should be considered separately. The same applies to the densitometry because the absolute values of the intensity of individual CI and that within SCs largely differ. Therefore, we think that it is more correct to compare the abundance of individual CI in the WT vs. either KO or KD pairs and the abundance of the CI in SC independently using a t-test. With these new statistical analyses, the difference in the enzyme activity of CI reported in figure 4D is now significant, which we consider reflects better our observations. Also, with these new analyses, the difference in the amounts of CI+CIII are significantly higher in the Coa8 KD (Figure S1B). Therefore, the original affirmation is correct and we have left the sentence as it was.

      Figure 4H, the assignments of the observed larger bands seem incorrect. The largest band (currently assigned as SC I1+III2+IV1) represents too large of a shift for only the addition of CIV and the band currently assigned at SC I1+III2 appears to also contain CIV. The identity of these bands should be reevaluated and additional experiments are needed to definitively prove their identity. This uncertainty should be addressed experimentally or made more explicit in the text.

      Thank you for the comment. Taking a closer look at the images, we have to agree with the Reviewer that the assignment was incorrect. The higher band is too large indeed and the reviewer is correct that the band that we previously assigned as CI1+CIII2 does appear to contain CIV as well. Therefore, we have changed the labeling of that to CI1+CIII2+CIV1 because the stoichiometry is compatible with the apparent MW. Also, we have renamed the higher MW band to HMW-SC (high-MW SC) of uncertain nature (unknown stoichiometry) but clearly containing all three complexes I, III and IV. We amended the text (lines 219-221) plus figures 5H and S1 accordingly.

      Line 302, the authors state that the structural basis for less SC in D. melanogaster is "due to a more stable association of the NDUFA11 subunit..." However, this would not result is a less stable SC association and only explains why NDUFA11 is more stably associated with CI in the absence of CIII2. The more likely structural reason for the observation of less SC in D. melanogaster is the N-terminal truncation of Dm-NDUFB4 relative to mammalian NDUFB4. This truncation results in the loss of a major SC interaction site between CI and CIII2 in the matrix.

      Thank you for pointing this out. We have amended the text accordingly.

      Reviewer #2 (Public Review):

      Respiratory chain complexes assemble in higher-ordered structures termed supercomplexes or respirasomes. The functional significance of these assemblies is currently investigated, there are two main hypothesis tested, namely that supercomplexes provide kinetic advantages or structural stability. Here, the authors use the fruitfly to reveal that, while the respiratoy chain in the organism normally does not form higher-order assemblies, it does so under conditions when their assembly is impaired. Because the rather moderate increase in supercomplex formation does not change oxygen consumption stimulated by CI or CII substrate, the authors conclude that supercomplex formation has more a structural than a functional role. The main strength of this work is that the technical quality of the experiments is high and that the authors induced defects in respiratory chain assembly through sets of well-controlled genetic models. The obtained data are mostly descriptive using standard approaches and are very well executed. The authors claim that their experiments allow to conclude that the role of supercomplex formation is restricted to a structural role and, hence, exclude a function directly related to electron transport efficiency. However, while the authors can show convincingly that supercomplexes form in the mutants, but not in the wild type, their main claim is not well supported by data and both the structural mechanism of supercompelx formation and their significance remain unknown. While the supercomplex formation observed only in mitochondrial mutants per se is interesting, it would be good to great to define structural aspects of supercomplex formation and their potential impact on the stability of the respiratory chain complexes in these mutants.

      We thank the Reviewer for the positive assessment of our work and the suggestions to improve the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      The sentence on line 90, which starts "This is in contrast with..." is unclear and needs to be rewritten.

      Thank you. We have modified the sentence to make it clearer.

      Lines 153 and 155, reference is made to tissue specific expression patterns but no literature reference is provided.

      Thank you for the comment. The tissue specific expression patterns of the different isoforms are reported in the FlyBase database. We added the link to website in the text.

      Line 188, "...homogenates in presence of..." should read "homogenates in the presence of..."

      Thank you. Amended.

      Line 336, "...lower to the increase..." should read "...lower than the increase..."

      Thank you. Amended.

      Reviewer #2 (Recommendations For The Authors):

      • In order to unravel the molecular mechanism by which supercomplexes form in the mutant, it would be important to identify the factor mediating this. Prime candidates would be additional proteins that co-purify of co-fractionate with the respiratory chain when they assemble into supercomplexes or changes in the lipid composition of the mitochondria, where cardiolipin has been shown to stabilize supercomplex formation. The inclusion and analysis of complexome data for all mutants would be excellent, plus an MS analysis of a purified supercomplex.

      Thank you for the suggestion to which we completely agree. We have taken a closer look to the hierarchical clustering of peptide intensities in our complexome profiling data, which clusters the proteins according to their similarity in electrophoretic migration within the complexes. We have specifically looked for proteins in which the peptide intensity changed in a similar fashion as the complex I structural subunits. Among the four candidate proteins (Uniprot IDs Q8SXY6, Q95T19, Q9W0Y6, Q9VJQ3), only Q95T19 — Serine--tRNA synthetase-like protein Slimp is annotated as a mitochondrial protein. This protein is a Drosophila-specific paralog of the mitochondrial Serine-tRNA synthetase generated by gene duplication (PMID: 20870726), which carries out a function linking mitochondrial translation with mtDNA maintenance (PMID: 30943413). Therefore, in principle we would not consider it as a good candidate to be a ‘SC assembly factor’. The identification of factors promoting the formation of SC in Drosophila under these conditions is definitely an important point warranting future investigation.

      • The authors could define the stability of the respiratory chain complexes through metabolic pulse-chase labeling experiments. This could reveal that the role of supercomplex formation is indeed structural, improving stability.

      We agree that this would be an important piece of information to understand the phenomenon we have observed. Unfortunately, it is technically impossible to perform metabolic labeling of mitochondrial proteins in whole flies. It would be possible to perform in organello pulse-chase labelling, however our previous experience indicates that complex I does not completely assemble de novo in isolated mitochondria (PMID: 20385768).

      • The authors should analyze oxygen consumption from mitochondria isolated from larvae as in the other experiments on enzyme activities or the (high-quality) BN-PAGE, and not from whole flies that are homogenized. Moreover, they need to determine the quantities of the complexes by complementary experiments (MS, Western blotting or spectroscopy).

      Thank you for the comments. However, we believe that repeating the entire analyses with the larvae would not add significant information to the work and the main interpretation would not change, as the main claim of the paper is based on the data collected on adult flies. In addition, the band patterns of MRC complexes in the BNGE is the same between larvae and adults and therefore, does not depend on the developmental stage. Regarding the quantification of the complexes, we think that the data provided by using complementary approaches such as in gel activity assays (IGA), western blot (WB) and kinetic assays of MRC enzymatic activities, allowed us to confidently determine the amount of the individual complexes. Hence, we performed IGA assays and enzymatic activity assays (which reflect the amounts of fully assembled and functional complexes) in triplicate (independent samples). For the WB analyses, due to the scarcity of some of the antibodies available to detect the Dm MRC proteins, which were a kind gift of Dr. Edward Owusu-Ansah (Columbia University), we decided to pool the three independent samples of each group before running them through the Blue-Native gels. The densitometric curves of the WB bands (Figure S2) show the abundance of each individual MRC complex within the ‘free’ and SC forms. We prioritized the BN analyses over SDS-PAGE and WB analysis, as we consider that just measuring the steady-state levels of MRC subunits is not as informative, because it is possible that certain subunits are present in the mitochondrial membranes but not assembled into the final mature structures.

      • Can changes in Coenzyme Q levels explain the absence of a defect on electron transport? This could be determined for the mutant as well as the wild type animals.

      We agree that this would be a relevant aspect to investigate. For example, determining whether lower CoQ levels are able to maintain the same respiratory activities in the models in which higher amounts of SCs are formed, as it was proposed in Shimada et al. (PMID: 29191512) would be very interesting. However, the fact that the mild KD models show no MRC enzymatic defects whatsoever (Figure 4D, Figure 5I and Figure 6I), provides the most straightforward explanation to the observed absence of respiratory defects.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Some sentences need to be clarified and some additional data and references could be added.

      1) Line 18

      SRY is the sex-determining gene

      SRY is the testis-determining gene is more accurate as described in line 44

      Modification done

      2) Line 50

      Despite losing its function in early testis determination in mice, DMRT1 retained part of this function in adulthood when it is necessary to maintain Sertoli cell identity.

      Losing its function is misleading. The authors describe firstly that Dmrt1 has no obvious function in embryonic testis development but is critical for the maintenance of Sertoli cells in adult mice. The wording "losing its function in early testis" is confusing. Do the authors mean that despite the expression of Dmrt1 in early testis development, the function of Dmrt1 seems to be restricted to adults in mice? A comparison between the testis and ovary should be more cautious since GarciaAlonso et al (2022) have shown that the transcriptomics of supporting cells between humans and mice is partly different.

      That’s what we thought, and the sentence has been changed as follow: “Although DMRT1 is not required for testis determination in mice, it retained part of its function in adulthood when it is necessary to maintain Sertoli cell identity.” (line 51 to 53)

      3) Line 78

      XY DMRT1-/- rabbits showed early male-to-female sex reversal.

      Sex reversal indicates that there is no transient Sertoli cell differentiation that transdifferentiate into granulosa cells. This brings us to an interesting point. In the case of reprogramming, the transient Sertoli cells can produce AMH leading to the regression of the Mullerian ducts. In humans, some 9pdeleted XY patients have Mullerian duct remnants and feminized external genitalia. This finding indicates early defects in testis development.

      Is there also feminized external genitalia in XY Dmrt1−/− rabbits. Can the authors comment on the phenotype of the ducts?

      We proposed to add “and complete female genitalia” at the end of the following sentence: “Secondly, thanks to our CRISPR/Cas9 genetically modified rabbit model, we demonstrated that DMRT1 was required for testis differentiation since XY DMRT1-/- rabbits showed early male-tofemale sex reversal with differentiating ovaries and complete female genitalia.” (line 77 to 80)

      Indeed, since the first stage (16 dpc) where we can predict the sex of the individual by observing its gonads during dissection, we always predict a female sex for XY DMRT1 KO fetuses. It is only genotyping that reveals an XY genotype. At birth, our rabbits are sexed by technicians from the facility and again, but now based on the external genitalia, they always phenotype these rabbits as female ones. In these XY KO rabbits, the supporting cells never differentiate into Sertoli, and ovarian differentiation occurs as early as in XX animals. Thus, these animals are fully feminized with female internal and external genitalia. Most of 9p-deleted patients are not homozygous for the loss-offunction of DMRT1, and the remaining wild-type allele could explain the discrepancy between KO rabbits and humans.

      4) Line 53

      In the ovary, an equivalent to DMRT1 was observed since FOXL2 (Forkhead family box L2) is expressed in female supporting cells very early in development.

      Can the authors clarify what is the equivalent of DMRT1, is it FOXL2? DMRT1 heterozygous mutations result in XY gonad dysgenesis suggesting haploinsufficiency of DMRT1. However, to my knowledge, there is no evidence of haploinsufficiency in XX babies. Thus can we compare testis and ovarian genetics?

      We agree, the term “equivalent” is ambiguous, and we changed the sentence as follows: “In ovarian differentiation, FOXL2 (Forkhead family box L2) showed a similar function discrepancy between mice and goats as DMRT1 in the testis pathway. In the mouse, Foxl2 is expressed in female supporting cells early in development but does not appear necessary for fetal ovary differentiation. On the contrary, it is required in adult granulosa cells to maintain female-supporting cell identity.” (line 53 to 56)

      Regarding reviewer 2's question on haploinsufficiency in humans: the patient described in Murphy et al., 2015 is an XY individual with complete gonadal dysgenesis. But, it has been shown that the mutation carried by this patient leads to a dominant-negative protein, equivalent to a homozygous state (Murphy et al., 2022).

      For FOXL2 mutation in XX females, haploinsufficiency does not affect early ovarian differentiation (no sex reversal) but induces premature ovarian failure.

      We agree with the reviewer, we cannot compare testis and ovarian genetics considering two different genes.

      5) Line 55

      In mice, Foxl2 does not appear necessary for fetal ovary differentiation (Uda et al., 2004), while it is required in adult granulosa cells to maintain female-supporting cell identity (Ottolenghi et al., 2005). The reference Uhlenhaut et al (2009) reporting the phenotype of the deletion of Foxl2 in adults should be added.

      The reference has been added.

      6) Line 64<br /> These observations in the goat suggested that DMRT1 could retain function in SOX9 activation and, thus, in testis determination in several mammals.

      Lindeman et al (2021) have shown that DMRT1 can act as a pioneer factor to open chromatin upstream and Dmrt1 is expressed before Sry in mice (Raymond et al, 1999, Lei, Hornbaker et al, 2007). Whereas additional factors may compensate for the absence of Dmrt1, these results suggest that DMRT1 is also involved in Sox9 activation.

      Dmrt1 is indeed expressed before Sry/Sox9 in the mouse gonad. However, no binding site for DMRT1 could be observed at Sox9 enhancer 13 in mice. This does not support a role for DMRT1 in the activation of Sox9 expression in this species. Furthermore, in Lindeman et al 2021, the authors clearly state that DMRT1 acts as a pioneering factor for SOX9 only after birth. It does not appear to have this role before. One of the explanations put forward is that the state of chromatin is different during fetal development in mice: chromatin is more permissive and does not require a factor to facilitate its opening. This hypothesis is based in particular on the description of a similar chromatin profile in the precursors of XX and XY fetal supporting cells, where many common regions display an open structure (Garcia-Moreno et al., 2019). Once sex determination and differentiation are established, a sex-specific epigenome is set up in gonadal cells. Chromatin remodeling agents are then needed to regulate gene expression. We hypothesize that in non-murine mammals such as rabbits, the state of gonadal cell chromatin would be different in the fetal period, more repressed, requiring the intervention of specific factors for its opening, such as DMRT1.

      7) Figure 1

      Most of the readers might not be familiar with the developmental stages of the gonad in rabbits. A diagram of the key stages in gonad development would facilitate the understanding of the results.

      Thank you, it has been added in Figure 1.

      8) Figure 2

      Arrowheads are difficult to spot, could the authors use another color?

      Done

      9) Line 117: can the authors comment on the formation of the tunica albuginea? Do the epithelial cells acquire some specific characteristics?

      The formation of the tunica albuginea begins with the formation of loose connective tissue beneath the surface epithelium of the male gonad. The appearance of this tissue is concomitant with the loss of expression of DMRT1 in the cell of the coelomic epithelium. Our interpretation is that the contribution of the cells from the coelomic epithelium and their proliferation stops when the tunica begins to form because the structure of the tissue beneath the epithelium change, and the cellular interactions between the epithelium and the tissue below remain disrupted. By contrast, these interactions persist in the ovary until around birth for ovigerous nest formation.

      10) The first part of the results described DMRT1 expression in rabbits. With the new single-cell transcriptomic atlas of human gonads, it would be important to describe the pattern of expression in this species. This could be described in the introduction in order to know the DMRT1 expression pattern in the human gonad before that of the rabbit.

      A comment on the expression pattern of DMRT1 in human fetal gonads has been added in the discussion section: “In the human fetal testis, DMRT1 expression is co-detected with SRY in early supporting gonadal cells (ESCGs), which become Sertoli cells following the activation of SOX9 expression (Garcia-Alonso et al., 2022) » (line 222 to 224)

      11) Figure 3 supplement 3

      Dotted line: delimitation of the ovarian surface epithelium. Could the authors check that there is a dotted line?

      Done

      12) Figure 5 and Line 186

      Quantification is missing such as the % of germ cells, % of meiotic germ cells.

      Quantification is not easy to realize in rabbits because of the size and the elongated shape of the gonad. Indeed, it’s difficult to be sure that both sections (one from WT, the other from KO) are strictly in a similar region of the gonad and that the section is perfectly longitudinal or not. See also our answer to reviewer 3 (point 7) on this aspect. Actually, we are trying to make a better characterization of this XX phenotype and to find a marker of the pre-leptotene/leptotene stage susceptible to work in rabbits (SYCP3 will be the best, but we encountered huge difficulties with different antibodies and even RNAscope probe!). So actually, the most convincing indirect evidence of this pre-meiotic blockage (in addition to HE staining at 18 dpp in the new Figure 6) is the persistence of POU5F1 (pluripotency), specifically in the germinal lineage of KO XX and XY gonads. In addition to the new figure supplement 5, we can show you in Author response image 1: (i) the gonadal section at a lower magnification, where it is evident that there is a big difference between WT and KO germ cell POU5F1-stainings; and (ii) POU5F1 expression from a bulk RNA-seq realized the day after birth at 1 dpp where the difference is also transcriptionally very clear.

      Author response image 1.

      13) Line 186,

      E is missing at preleptoten

      Added

      14) Figure supplement 7.

      A magnification of the histology of the gonads is missing.

      This figure is only for showing the gonadal size, and there are the same gonads as in the new Figure 6. So, the magnification is represented in Figure 6.

      15)Discussion

      Line 201

      SOX9, well known in vertebrates,

      The references of the human DSD associated with SOX9 mutations are missing. Thank you, references have been added.

      16) Line 286

      One of the targets of WNT signaling is Bmp2 in the somatic cells and in turn, Zglp1, which is required for meiosis entry in the ovary as shown by Miyauchi et al (2017) and Nagaoka et al (2020). Does the level of BMP pathway vary in DMRT1 mutants?

      At 20 dpc, the expression level of BMP2 in XY and XX DMRT1 mutants gonads is similar to the one of XX control which is lower than in XY control (see the TMP values from our RNA-seq in Author response image 2).

      Author response image 2.

      Reviewer #2 (Recommendations For The Authors):

      Here are my minor comments:

      1) Line 106- You mention that coelomic epithelial cells only express DMRT1. Please add an arrow to highlight where you refer to.

      Done

      2) Line 112: In mice, the SLCs also express Sox9 but not Sry apart from Pax8. You mention here that the SLCs are expressing SRY and DMRT1 in addition to PAX8. Could you perhaps explain the difference? Please refer to that in the results or discussion.

      We add a new sentence at the end of this paragraph on SLCs: “As in mice, these cells will express SOX9 at the latter stages (few of them are already SOX9 positive at 15 dpc), but unlike mice, they express SRY.” (line 114 to 115)

      We already have collaborations with different labs on these SLC cells, and we will certainly come back later on this aspect, remaining slightly off-topic here.

      3) Could you please explain why did you chose to target Exon 3 of DMRT1 and not exons 1-2 which contain the DM domain? Was it to prevent damaging other DMRT proteins? Is there an important domain or function in Exon 2?

      Our choice was mainly based on technical issues (rabbit genome annotation & sgRNA design), but also we want to avoid targeting the DM domain due to its strong conservation with other DMRT genes. Due to the poor quality of the rabbit genome, exons 1 and 2 are not well annotated in this species. We have amplified and sequenced the region encompassing exons 1 & 2 from our rabbit line, but the software used for sgRNA design does not predict good guides on this region. The two best sgRNAs were predicted on exon 3, and we used both to obtain more mutated alleles.

      4) Your scheme in Supp Figure 4 is not so clear. It is not clear that the black box between the two guides is part of Exon 3 (labelled in blue).

      The scheme has been improved.

      5) Did you only have 1 good founder rabbit in your experiment? Why did you choose to work with a line that had duplication rather than deletion?

      Very good point! In the first version of this paper, we’d try to explain the long (around 2 years) story of breeding to obtain the founder animal. Here it is:

      During the genome editing process, we generate 6 mosaic founder animals (5 males and 1 female), then we cross them with wild-type animals to isolate each mutated allele in F1 offspring used afterward to establish and amplify knockout lines. Unexpectedly, we observe a very slow ratio of mutated allele transmission (5 on 129 F1 animals), and only one mutated allele has been conserved from the unique surviving adult F1 animal. It consists of an insertion of the deleted 47 bp DNA fragment, flanked by the cutting sites of the two RNA guides used with Cas9.<br /> The main hypothesis to explain this mutation event is that in the same embryonic cell, the deletion occurs on one allele then the deleted fragment remains inserted into the other allele. Under this scheme, the embryonic cell carries a homozygous DMRT1 knockout genotype, albeit heterogeneous, with a deleted allele (del47) and the present allele (insertion of a 47 bp fragment leading to an in sense duplication). This may explain the very low frequency of transmission since all germ cells carrying a homozygous DMRT1-/- genotype will probably not be able to enter the meiotic process as suggested by our results on XX and XY DMRT1-/- ovaries. Finally, and under this hypothesis, the way we obtained this unique founder animal remains a mystery!

      6) Figure 4- real-time data- where does it say what is a,b,c,d of the significance? It should appear on the figure itself and not elsewhere.

      Modification done.

      7) If I understand correctly, you were able to get the rabbits born and kept to adulthood (you show in supp figure 7 their gonads). What was the external phenotype of these rabbits? Did the XY mutant gonads have the internal and external genitals of a female (oviduct, uterus, vagina etc.)?

      See our answer to Reviewer 1 on this question (point 3).

      8) Line 20: It is more correct to write 46, XY DSD rather than XY DSD

      Modification done.

      9) Line 21: you can remove the "the" after abolished

      Modification done.

      10) Line 31: consider replacing the first "and" by "as well as" since the sentence sounds strange with two "and".

      Modification done.

      11) Line 212- Please check with the eLife guidelines if they allow "data not shown" in the paper.

      This is unspecified.

      Reviewer #3 (Recommendations For The Authors):

      The following points should be addressed.

      1) The in situ's in Fig 1 and 2 are very clear. Fig 1 and Fig 2, In situ hybridisation in tissue sections, it looked like DMRT1 could be expressed in some cells where SRY mRNA is absent @ E13.5dpc and 14.5 dpc. Do you think this is real, or maybe Sry is turned off now in those cells?

      Based on the results of in situ hybridizations, DMRT1 appears to be expressed by both coelomic epithelium and genital crest medullar cells in a pattern that is actually broader than that of SRY. Moreover, in rabbits, SRY expression seems to start in the medulla of the genital ridge rather than in the surface epithelium, as described in mice (see Figure 1 at 12 and 13 dpc). Nevertheless, more detailed analyses are needed to ensure the lineage of cells expressing SRY and/or DMRT1, such as single-cell RNAseq at these key stages of sexual determination in rabbits (from 12 to 16 dpc).

      2) It is curious that SRY expression is elevated in the DMRT1 KO (Knockout) rabbit gonads. Does this suggest feedback inhibition by DMRt1, or maybe indirect via effect on Sox9 (as I believe Sox9 feeds back to down-regulate Sry in mouse, for example).

      The maintenance of SRY expression in the DMRT1 -/- rabbit testis seems to be linked to the absence of SOX9 expression. We believe that, as in mice, SOX9 would down-regulate SRY (even if, in rabbits, SRY expression is never completely turned off).

      3) I suggest the targeting strategy and proof of DMRT1 knockout by sequencing etc. be brought out of the suppl. Data and shown as a figure in the text.

      See also our answer to reviewer 2 (point 5). It has needed huge efforts to obtain these DMRT1 mutated rabbit line, and of course, it constitutes the basis of the study. But regarding the title and the main message of the article, we are not convinced that the targeting strategy should be moved into the main text.

      4) Unless there are limitations imposed by the journal, I also feel that Suppl Fig 5 (the immunostaining) deserves to be in the paper text too. The Fig showing loss of DMRt1 by immunostaining is important.

      We include the figure supplement 5 in the main text. So, Figure 4E and figure supplement 5 have been combined into a new Figure 5.

      5) The RT-qPCR data should have the statistics clarified on the graphs. (e.g., it is stated that, although Sox9 mRNA is clearly down, there is a slight increase compared to control on KO XX gonads. Is this statistically significant? Figure legend states that the Kruskal-Wallis test is used, and significance is shown by letters. This is unclear. It would be better to use the more usual asterisks and lines to show comparisons.

      Modification done.

      6) Reference is made to DMRT1+/- rabbits having aberrant germ cell development, pointing to a dosage effect. This is interesting. Does the somatic part of the gonad look completely normal in the het knockouts?

      DMRT1 heterozygous male rabbits have a phenotype of secondary infertility with aging, and we are trying now to better characterize this phenotype. The problem is complex because, as we cannot carry out conditional KO, it remains difficult to decipher the consequence of DMRT1 haploinsufficiency in the Sertoli cells versus the germinal ones. Anyway, the somatic part is sufficiently normal to support spermatogenesis since heterozygous males are fertile at puberty and for some months thereafter.

      7) Can the authors indicate why meiotic markers were not used to explore the germ cell phenotype? It would be advantageous to use a meiotic germ cell marker to definitely show that the germ cells do not enter meiosis after DMRT1 loss. (Not just H/E staining or maintenance of POU). Example SYCP3, or STRA8 (as pre-meiotic marker) by in situ or immunostaining. Even though no germ cells were detected in adult KO gonads.

      The expression of pre-meiotic or meiotic markers is currently under study in DMRT1 -/- females. Transcriptomic data (RNA-seq) are also being analyzed. We are preparing a specific article on the role of DMRT1 in ovarian differentiation in rabbits. We felt it was important to reveal the phenotype observed in females in this first article, but we still need time to refine our description and understanding of the role of DMRT1 in the female.

      8) What future studies could be conducted? In the Discussion section, it is suggested that DMRT1 could act as a pioneering factor to allow SRY action upon Sox9. How could this be further explored?

      To explore the function of DMRT1 as a pioneering factor, it now seems necessary to characterize the epigenetic landscapes of rabbit fetal gonads expressing or not DMRT1 (comparison of control and DMRT1-/- gonads). Two complementary approaches could be privileged: the study of chromatin opening (ATAC-seq) and the analysis of the activation state of regulatory regions (CUT&Tag). The study of several histone marks, such as H3K4me3 (active promoters), H3K4me1 (primed enhancers), H3K27ac (enhancers and active promoters), and H3K27me3 (enhancers and repressed promoters), would be of great interest. However, these techniques are only relevant for gonads that can be separated from the adjacent mesonephros, which is only possible from the 16 dpc stage in rabbits. To perform a relevant analysis at earlier stages, a "single-nucleus" approach such as ATAC-seq singlenucleus or multi-omic single-nucleus combining ATAC-seq and RNA-seq could be used.

    1. Author Response

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      Comment on revised manuscript: Thank you for your responses - they have addressed most of my concerns.

      We thank the reviewer again for their assistance in improving our manuscript.

      Reviewer #2:

      Additional context:

      The sex differences between the samples are interesting as effects of sex are commonly found in AAC tasks. It would be interesting to look at the main model comparison with sex included as a covariate.

      Firstly, we thank the reviewer for their re-evaluation of our manuscript.

      To the reviewer’s comment, we apologise for the lack of clarity. The analyses included in our revision were indeed based on the main logistic regression model of choice, including sex and age as covariates. We have clarified this in the manuscript as follows:

      While sex was significantly associated with choice in the hierarchical logistic regression in the discovery sample (β = 0.16 ± 0.07, p = 0.028) with males being more likely to choose the conflict option, this pattern was not evident in the replication sample (β = 0.08 ± 0.06, p = 0.173), and age was not associated with choice in either sample (p > 0.2).

      As it is difficult to include sex as a covariate in the reinforcement learning models in the classical sense as in a linear regression, we assessed sex effects on the individual parameters produced by these models instead, as follows:

      Comparing parameters across sexes via Welch’s t-tests revealed significant differences in reward sensitivity (t289 = -2.87, p = 0.004, d = 0.34; lower in females) and consequently reward-punishment sensitivity index (t336 = -2.03, p = 0.043, d = 0.22; lower in females i.e. more avoidance-driven). In the replication sample, we observed the same effect on reward-punishment sensitivity index (t626 = -2.79, p = 0.005, d = 0.22; lower in females). However, the sex difference in reward sensitivity did not replicate (p = 0.441), although we did observe a significant sex difference in punishment sensitivity in the replication sample (t626 = 2.26, p = 0.024, d = 0.18).

      Could the authors double check the mean/SD of approach in each group for typos? The numbers are identical.

      Thank you for spotting this – the means were indeed similar (discovery: 0.521, replication: 0.516), but the standard deviations were marginally different (discovery: 0.140, replication: 0.148). We have amended the manuscript to reflect this, as follows:

      Across individuals, there was considerable variability in overall choice proportions (discovery sample: mean = 0.52, SD = 0.14, min/max = [0.03, 0.96]; replication sample: mean = 0.52, SD = 0.15, min/max = [0.01, 0.99]).

      Reviewer #3:

      The revised paper commendably adds important additional information and analyses to support these claims. The initial concern that not accounting for participant control over punisher intensity confounded interpretation of effects has been largely addressed in follow-up analyses and discussion.

      I commend the authors on their revisions. My initial concerns have been largely addressed. Minor suggestions below.

      We thank the reviewer again for their assistance in improving our analyses and manuscript.

      Changing the visualisation of the logistic regression model in Figure 2 to tertiles instead of quartiles seems expedient, and does not properly address the points raised by the other reviewers. The argument that non-linear trends in the extreme bins are due to less data is plausible, but unsatisfying given how reliable the pattern seems to be (across samples, with small standard error) and . It is possible, albeit perplexing, that the influence of punishment probability on choice is non-linear. I think the current figure with tertiles is acceptable, but I would suggest including the figures with non-linear data as a supplementary figure, for sake of transparency and reader interest.

      We agree that this is likely more complex than a simple linear effect (in the logistic space), especially given the concurrent reward probabilities which also fluctuate in the task. We also agree that the non-linear figures should be made available in the interests of transparency, and have included them in the Supplementary Materials.

      We direct interested readers to the relevant section from the figure legend as follows:

      "Figure 2. Predictors of choice in the approach-avoidance reinforcement learning task. … We show linear curves here since these effects were estimated as linear effects in the logistic regression models, however the raw data showed non-linear trends – see Supplementary Figure 15."

      We have included the non-linear figures in Supplementary Section 9.11 Effects of outcome probabilities on choice in the task: non-linear effects as Supplementary Figure 15.

      As an aside, the argument that approach-avoidance joystick tasks do not have a non-human counterpart misconstrues the translational root of these tasks, which was (at least in part) an attempt to model (successfully or not) general approach/avoidance processes measured in non-human tasks, e.g. appetitive/aversive runway tasks using rodents.

      Our aim in this manuscript was to develop a task that was closely matched to non-human counterparts in both the experimental procedure (choice over reward/punishment outcomes) and cognitive process involved (simultaneous reward/punishment learning). With this in mind, we wanted to convey that non-human and human measures of approach/avoidance processes were historically distinct in terms of the procedures (e.g. using a joystick vs navigating a runway, due to ethological differences), and that this was potentially problematic with respect to computational validity. However, at this early point in the introduction, it was unnecessary to make a strong distinction between these tasks, which as the reviewer duly notes, follow similar approach/avoidance principles and share similar experimental roots. Therefore, we have opted to omit the reference to translational similarity in the relevant text, as follows:

      In humans, on the other hand, approach-avoidance conflict has historically been measured using questionnaires such as the Behavioural Inhibition/Activation Scale (Carver and White 1994), or cognitive tasks that rely on motor/response time biases, for example by using joysticks to approach/move towards positive stimuli and avoid/move away from negative stimuli (Guitart-Masip, Huys et al. 2012, Phaf, Mohr et al. 2014, Kirlic, Young et al. 2017, Mkrtchian, Aylward et al. 2017).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript by He et al. explores the molecular basis of the different stinging behaviors of two related anemones. The freshwater Nematostella which only stings when a food stimulus is presented with mechanical stimulation and the saltwater Exaiptasia which stings in response to mechanical stimuli. The authors had previously shown that Nematostella stinging is calcium-dependent and mediated by a voltage-gated calcium channel (VGCC) with very pronounced voltage-dependent inactivation, which gets removed upon hyperpolarization produced by taste receptors.

      In this manuscript, they show that Exaiptacia and Nematostella differing stinging behavior is near optimal, according to their ecological niche, and conforms to predictions from a Markov decision model.

      It is also shown that Exaiptacia stinging is also calcium-dependent, but the calcium channel responsible is much less inactivated at resting potential and can readily induce nematocyte discharge only in the presence of mechanical stimulation. To this end, the authors record calcium currents from Exaipacia nematocysts and discover that the VGCCs in this anemone are not strongly inactivated and thus are easily activated by mechanical stimuli-induced depolarization accounting for the different stinging behavior between species. The authors further explore the role of the auxiliary beta subunit in the modulation of VGCC inactivation and show that different n-terminal splice variants in Exaiptacia produce strong and weak voltage-dependent inactivation.

      The manuscript is clear and well-written and the conclusions are in general supported by the experiments and analysis. The findings are very relevant to increase our understanding of the molecular basis of non-neural behavior and its evolutionary basis. This manuscript should be of general interest to biologists as well as to more specialized fields such as ion channel biophysics and physiology.

      Some findings need to be clarified and perhaps additional experiments performed.

      1) The authors identify by sequencing that the Exaiptacia Cav is a P-type channel (cacna1a). However, the biophysical properties of the nematocyte channel are different from mammalian P-type channels. The cnidarian channel inactivation is exceedingly rapid and activation happens at relatively low voltages. These substantial differences should be mentioned and commented on.

      First, we thank Reviewer 1 for thoughtful and detail-oriented comments, as well as their shared appreciation for the molecular basis of unique behaviors. Indeed, Nematostella and rat CaV channels exhibit striking differences in inactivation (both fast and steady-state). We previously described this in Weir et al., 2020 and added additonal text to ensure that this result is clear.

      2) The currents from Nematostella in Figure 3d seem to be poorly voltage-clamped. Poor voltage-clamp is also evident in the sudden increase of conductance in Figure 3C and might contribute to incorrect estimation of voltage dependence of activation and if present in inactivation experiments, also to incorrect estimation of the inactivation voltage range. This problem should be reassessed with new data.

      Because it is necessary to use small-tipped pipettes to get recordings from small and technically challenging nematocytes, there is imperfect voltage clamp that is evident in the steep activation curves. This issue should have little effect on the inactivation curves determined with 1s pre-pulses because poor voltage control occurs transiently at the beginning of the pre-pulse. In our case, current is measured in response to a brief maximally activating pulse followed by a nearly 1s period. Thus, error should be minimal in inactivation curves if the test pulse is a maximally activating voltage. We ensured that these protocols are clearly described in the Methods to address this issue. In addition, we are confident in the described inactivation values because they are generally consistent with channel properties measured in a heterologous expression system in which we do not have this problem and see the same differences in inactivation (also see Weir et al., 2020).

      3) While co-expression of the mouse Cav channel with the beta1 isoform from Exaiptacia indeed shifts inactivation to more negative voltages, it does not recapitulate the phenotype of the more inactivated Ca-currents in nematocytes (compare Figures 4d and 5d). It should be explained if this might be due to the use of a mammalian alpha subunit. Related to this, did the authors clone the alpha subunit from Exaiptacia? Using this to characterize the effect of beta subunits on inactivation might be more accurate.

      While the cnidarian CaVβ subunits indeed shift inactivation consistent with native properties, we agree that using the Exaiptasia alpha subunit would be more accurate. We were unable to successfully clone and heterologously express this subunit, however, we did express all subunits from Nematostella and made chimeric channels in which alpha, alpha2d, or CaVβ were swapped between Nematostella and mammalian channels. These experiments demonstrated the requirement and sufficiency of the CaVβ subunit in altering inactivation (Weir et al., 2020). Furthermore, we were able to express CaVβ subunits from a variety of other cnidarians, all of which affected inactivation properties. Thus, we are confident in the conclusion that CaVβ subunits are major contributors to molecular tuning of cnidarian CaV channels. Future studies aim to incorporate describing properties of the alpha subunit from Exaiptasia and other cnidarians.

      4) The in situ shown in Figure 4b are difficult to follow for a non-expert in cnidarian anatomy. Some guidance should be provided to understand the structures. Also, for the left panels, is the larger panel the two-channel image? If so, blue would indicate co-localization of the two isoforms and there seems to be a red mark in the same nematocyte.

      We thank the reviewer for this important comment and have modified the figure to enhance visual guidance. We more clearly highlighted the nematocyte in the single and two-channel images and selected the clearest representative images. For additional reference, previous studies beautifully illustrate the unusual morphology of nematocytes, including the relative localization of the nematocyst and nucleus in the context of cnidarian tissues (Babonis and Martindale, 2017).

      Reviewer #2 (Public Review):

      This manuscript links the distinctive stinging behavior of sea anemones in different ecological niches to varying inactivation properties of voltage-gated calcium channels that are conferred by the identity of auxiliary Cavbeta subunits. Previous work from the Bellono lab established that the burrowing anemone, Nematostella vectensis, expresses a CaV channel that is strongly inactivated at rest which requires a simultaneous delivery of prey extract and touch to elicit a stinging response, reflecting a precise stinging control adapted for predation. They show here that by contrast, the anemone Exaiptasia diaphana which inhabits exposed environments, indiscriminately stings for defense even in the absence of prey chemicals, and that this is enabled by the expression of a CaVbeta splice variant that confers weak inactivation. They further use the heterologous expression of CaV channels with wild type and chimeric anemone Cavbeta subunits to infer that the variable N-termini are important determinants of Cav channel inactivation properties.

      1) The authors found that Exaiptasia nematocytes could be characterized by two distinct inactivation phenotypes: (1) nematocytes with low-voltage threshold inactivation similar to that of Nematostella (Vi1/2 = ~ -85mV); and (2) a distinct population with weak, high-voltage threshold inactivation (Vi1/2 = ~ -48mV). What were the relative fractions of low-voltage and high-voltage nematocytes? Do the low-voltage Exaiptasia nematocytes behave similarly to Nematostella nematocytes with respect to requiring both prey extract and touch to discharge?

      We thank Reviewer 2 for thoughtful comments and questions. Nematocyte patch clamp is technically challenging due to small size, large nematocyst, and, notably, the explosive discharge involved in stinging! Therefore, we only patch clamped a small number of cells. Despite this limitation, we were able to observe two distinct nematocyte populations based on physiological properties. Yet, we did not observe a correlation with morphology and cannot make broad comments on relative fractions. Because morphology was generally similar and Exaiptasia nematocytes discharge even from touch alone, it remains unclear whether the low-voltage population behaves similarly to Nematostella nematocytes that only discharge in response to chemicals and touch. Future in vivo approaches could be used to address this question.

      2) The authors state in Fig 3 legend and in the results that Exaiptasia nematocyte voltage-gated Ca2+ currents have weak inactivation compared with Nematostella. This description is imprecise and inaccurate. Figure 3 in fact shows that Exaiptasia nematocyte voltage-gated Ca2+ currents display a faster rate of inactivation compared to Nematostella Ca2+ currents. A sub-population of Exaiptasia nematocytes does display less resting state (or steady-state) inactivation compared to Nematostella Ca2+ currents. The authors need to be more accurate and qualify what type of inactivation property they are talking about.'

      We thank Reviewer 2 for this attention to detail and have defined this phrasing early in the text.

      3) In a similar vein, the authors need to be more accurate when referring to 'rat beta' used in heterologous expression experiments. It should be made explicit throughout the manuscript that the rat beta isoform used is rat beta2a. Among the distinct beta isoforms, beta2a is unique in being palmitoylated at the N-terminus which confers a characteristic slow rate of inactivation and a right-shifted voltage-dependence of steady-state inactivation consistent with the data shown in Fig. 4D. Almost all other rat beta isoforms do not have these properties.

      We used the rat CaVβ2a for comparison because it shares the highest homology with Nematostella CaVβ (Weir et al., 2020). We have now more clearly defined the rat subunit in the text and legends.

      4) The profiling of the impact of different Cnidarian Cavbeta subunits on reconstituted Ca2+ channel current waveforms is nice (Fig 5 and Fig 5S1). The N-terminus sequence of EdCaVβ2 is different from palmitoylated rat beta2a, though both have similar properties in showing slow inactivation and a right-shifted voltage-dependence of steady-state inactivation. Does EdCaVβ2 target autonomously the plasma membrane when expressed in cells? If so, this would reconcile with what was previously known and provide a rational explanation for the observed functional impact of the distinct Cavbetas.

      As far as we understand the question, our data support that Exaiptasia CaVβ2 targets the plasma membrane for a number of reasons: 1) Expressing Exaiptasia CaVβ2 produces consistent properties in comparison with other CaVβs, suggesting a homogenous population of channel complexes; 2) Distinct cnidarian-Exaiptasia CaVβ2 chimeras produce distinct and internally consistent properties; and 3) Expressing P/Q-type CaV alpha + alpha2d subunits without CaVβ in cell lines does not produce robust measurable voltage-gated currents. We further tested this in our case and found the same result: at an equivalent maximally activating step using the same protocol, we measured 458.68 ± 179.88pA average current amplitude for +Exaiptasia CaVβ2 (n = 6) and 43.03 ± 17.64pA average current amplitude for -CaVβ2 (n = 4).

      Reviewer #3 (Public Review):

      Summary:

      The present article attempts to answer both the ultimate question of why different stinging behaviours have evolved in Cnidiarians with different ecological niches and shed light on the proximate question of which electro-physiological mechanisms underlie these distinct behaviours.

      Account of major methods and results:

      In the first part of the paper, the authors try to answer the ultimate question of why distinct dependencies of the sting response on internal starvation levels have evolved. The premise of the article that Exaiptasia's nematocyte discharge is independent of the presence of prey (Artemia nauplii) as compared to Nematostella's significant dependence of the discharge on the presence of actual prey, is shown be a robust phenomenon justified by the data in Figure 1.

      The hypothesis that defensive vs. predatory stinging leads to different nematocyte discharge behaviours is analysed in mathematical models based on the suitable framework of optimal control/decision theory. By assuming functional relations between the:

      1) cost of a full nematocyte discharge and the starvation level.

      2) probability of successful predation/avoidance on the discharge level.

      3) desirability/reward of the reached nutritional state.

      Based on these assumptions of environmental and internal influences, the optimal choice of attack intensity is calculated using Bellman's equation for this problem. The model predictions are validated using counted nematocytes on a coverslip. The scaling of normalised nematocyte discharge numbers with scaled starvation time is qualitatively comparable to what is predicted from the models. The abundance of nematocytes in the tentacles was, on the other hand, independent of the starvation state of the animals.

      Next, the authors turn to investigate the proximate cause of the differential stinging behaviour. The authors have previously reported convincing evidence that a strongly inactivating Cav2.1 channel ortholog (nCav) is used by Nematostella to prevent stinging in the absence of prey (Weir et al. 2020). This inactivation is released by hyperpolarising sensory inputs signalling the presence of prey. In this article, it is clearly shown by blocking respective currents that Exaiptasia, too, relies on extracellular Ca2+ influx to initiate stinging. Patch clamp data of the involved currents is provided in support. However, the authors find that in addition to the nCav with a low-inactivation threshold, Exaiptasia has a splice variant with a higher inactivation threshold expressed (Figure 3D).

      The authors hypothesise that it is this high-threshold nCav channel population that amplifies any voltage depolarisation to release a sting irrespective of the presence of prey signals. They found that the β subunit that is responsible for Nematostella's unusually low inactivation threshold exists in Exaiptasia as two alternative splice isoforms. These N-terminus variants also showed the greatest variation in a phylogenetic comparison (Figure 5), rendering it a candidate target for mutations causing variation in stinging responses.

      Appraisal of methodology in support of the conclusions:

      The authors base their inference on a normative model that yields quantitative predictions which is an exciting and challenging approach. The authors take care in stating the model assumptions as well as showing that the data indeed does not contradict their model predictions. The interesting comparative nature of the modelling part of the study is complicated by slightly different cost assumptions for the two scenarios. Hence, Figure 2 needs to be carefully digested by readers.

      We thank the reviewer for their careful revision of our work and excellent comments. We simplified Figure 2 considerably to make it easier to digest. We now compare the stinging response for predation vs defense under the same exact definition of cost per nematocyte for both models. You can find examples 1 and 2 in Figure 2 and examples 3 and 4 in Supplementary Figure 3 (see response below).

      It would be even more prudent to analyse the same set of cost-of-discharge vs. starvation scenarios for both species. Specifically, for Nematostella the complete cost-of-discharge vs starvation-state curves as for Exaiptasia (Figure 2E, example 2-4) could be used. It is likely that the differential effect size of Nematostella and Exaiptasia behaviour is the strongest if only the flat cost-of-discharge vs starvation is used (Figure 2A) for Nematostella. But as a worst-case comparison the other curves, where the cost to the animal scales with starvation would be a good comparison. This could help the reader to understand when the different prediction of Nematostella's behaviour breaks down. In addition, this minor change could shed light on broader topics like common trade-offs in pursuit predation.

      The results hold even when the cost increases moderately with starvation: Figure 2 now shows results with the same cost for predatory and defensive stinging (cost defined in Figure 2A, former examples 1 and 4). Predatory stinging robustly increases with starvation and defensive stinging remains constant or decreases. Interestingly, the fit between theory and data for both anemones improves by using the increasing cost (open circles in Figure 2E right). For other choices of increasing cost functions, defensive stinging will always decrease, and even more so if the cost increases dramatically (like for the former Examples 2 and 3). In contrast, predatory stinging will switch behavior if the cost increases too much with starvation (results with former Examples 2 and 3, now in Supplementary Figure 3 and theoretical arguments in Supplementary Information). Note however that these assumptions are less realistic because they necessitate that the cost of stinging for well-fed animals is negligible with respect to the cost for starved animals. A formal proof of the asymptotic solution for predatory stinging with varying cost is beyond the scope of this work and is subject of ongoing work where we consider implications for Markov Decision Processes in continuous space state.

      The qualitatively similar scaling of the model-derived relation between starvation and sting intensity with the counted nematocytes for different feeding pauses is evidence that feeding has indeed been optimised for the two distinct ecological niches. To prove that Exaiptasia uses a similar Ca2+ channel ortholog as well as a different splice variant, the authors employed both clean electrophysiological characterisaiton (Figure 3) as well as transcriptomics data (Figure 4S1).

      To strengthen the authors' hypothesis that variation in the N-termini leads to changes in Ca2+ channel inactivation and hence altered stinging, the response sequence variability of 6 Cnidaria was analysed.

      Additional context:

      Although, the present article focuses on nematocytes alone, currently, there has been a refocus in neurobiology on the nervous systems of more basal metazoans, which received much attention already in the works of Romanes (1885). In part, this is driven by the goal to understand the early evolution of nervous systems. Cnidarians and Ctenophors are exciting model organisms in this venture. This will hopefully be accompanied by more comparative studies like the present one. Some of the recent literature also uses computational models to understand mechanisms of motor behaviour using full-body simulations (Pallasdies et al. 2019; Wang et al. 2023), which can be thought of as complementary to the normative modelling provided by the authors.

      Comparative studies of recent Cnidarians, such as the present article, can shed light on speculative ideas on the origin of nervous systems (Jékely, Keijzer, and Godfrey-Smith 2015). During a time (the Ediacarium/Cambrium transition) that has seen the genesis of complex trophic foodwebs with preditor-prey interaction, symbioses, but also an increase of body sizes and shapes, multiple ultimate causes can be envisioned that drove the increase in behavioural complexity. The authors show that not all of it needs to be implemented in dedicated nerve cells.

      References:

      Jékely, Gáspár, Fred Keijzer, and Peter Godfrey-Smith. 2015. "An Option Space for Early Neural Evolution." Philosophical Transactions of the Royal Society B: Biological Sciences 370 (December): 20150181. https://doi.org/10.1098/rstb.2015.0181.

      Pallasdies, Fabian, Sven Goedeke, Wilhelm Braun, and Raoul-Martin Memmesheimer. 2019. "From Single Neurons to Behavior in the Jellyfish Aurelia Aurita." eLife 8 (December). https://doi.org/10.7554/elife.50084.

      Romanes, G. J. 1885. Jelly-Fish, Star-Fish and Sea-Urchins: Being a Research on Primitive Nervous Systems. Appleton.

      Wang, Hengji, Joshua Swore, Shashank Sharma, John R. Szymanski, Rafael Yuste, Thomas L. Daniel, Michael Regnier, Martha M. Bosma, and Adrienne L. Fairhall. 2023. "A Complete Biomechanical Model of hydra Contractile Behaviors, from Neural Drive to Muscle to Movement." Proceedings of the National Academy of Sciences 120 (March). https://doi.org/10.1073/pnas.2210439120.

      Weir, Keiko, Christophe Dupre, Lena van Giesen, Amy S-Y Lee, and Nicholas W Bellono. 2020. "A Molecular Filter for the Cnidarian Stinging Response." eLife 9 (May). https://doi.org/10.7554/elife.57578.

      We appreciate the excellent suggestion to further discuss non-neuronal adaptations in the context of studying the evolution of behavior. We have added additional text to the Discussion to cover this interesting field.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First and foremost, we would like to thank all the editors and reviewers for their thoughtful and thorough evaluations of our manuscript. We greatly appreciate their assessment about the novelty and strength in this study and have revised the manuscript according to their recommendations. Below are our detailed responses and revisions based on the reviewer recommendations.

      Reviewer #1 (Recommendations For The Authors):

      1) It is unclear the rationale for choosing the P35-42 adolescent window for stimulating the mesofrontal dopamine system.

      The dopaminergic innervation in the mesofrontal circuit exhibits a protracted maturation from P21 to P56 (Kalsbeek, Voorn et al. 1988, Niwa, Kamiya et al. 2010, Naneix, Marchand et al. 2012, Hoops and Flores 2017). P35-42 is in the center of this period and captures the mid-adolescent stage in rodents (Spear 2000). We have previously shown that increasing dopamine neuron activity by wheel running or optogenetic stimulation during this period, but not adulthood, can induce formation of mesofrontal dopaminergic boutons and enhance mesofrontal circuit activity in wild-type mice (Mastwal, Ye et al. 2014). We therefore chose the P35-P42 adolescent window to stimulate the mesofrontal dopamine circuit and test the long-term effect of this intervention on the frontal circuit and memory-guided decision-making deficits in mutant mice. We have detailed this rationale in the revised manuscript when we first introduced this intervention.

      2). Please provide a justification for choosing the optical recording M2 neuronal activity instead of the prelimbic prefrontal cortex, which has been known to show the highest levels of dopamine terminals.

      While the prelimbic area has the highest level of dopamine terminals among frontal cortical regions, a robust presence of dopaminergic terminals and dopamine release in the M2 frontal cortex have been well documented (Berger, Gaspar et al. 1991, Mastwal, Ye et al. 2014, Aransay, Rodriguez-Lopez et al. 2015, Patriarchi, Cho et al. 2018). The M2 cortex plays an important role in action planning, generating the earliest neural signals among frontal cortical regions that are related to upcoming choice during spatial navigation (Sul, Kim et al. 2010, Sul, Jo et al. 2011). Our chemogenetic inactivation experiments (Supplementary Fig 1) has further confirmed the involvement of M2 in the memory-guided Y-maze navigation task used in this study. Technically, M2 has the advantage of being more amendable to optical recording of neuronal activity without the tissue damage caused by implanting a lens, which would be necessary for deeper areas such as the prelimbic cortex. We have provided this justification in the revised manuscript.

      3). What was the rationale for using the 3-day chemogenetic stimulation paradigm?

      Our previous work in wild-type adolescent mice showed that a single optogenetic stimulation session or a 2-hr wheel running session is sufficient to induce bouton formation in mesofrontal dopaminergic axons (Mastwal, Ye et al. 2014). In this study, we sought to rescue existing structural and functional deficits in the mesofrontal dopaminergic circuits due to genetic mutations. Because previous studies suggested that an optimal level of dopamine is important for normal cognitive function (Arnsten, Cai et al. 1994, Robbins 2000, Floresco 2013), we elected to do multiple stimulation sessions to boost the potential rescue effects. We tested both a 3-day and a 3-week stimulation paradigm, and found that the 3-day, but not the 3-week paradigm led to robust functional improvement (Fig. 5). These results indicate that moderate but not excessive stimulation of dopamine neurons can provide functional improvement of a deficient mesofrontal circuit. We have revised our text to clarify the rationale for these experiments.

      4). A major maturational event occurring in the prefrontal cortex is the gain of local GABAergic transmission, which is crucial for sustaining proper levels of Y-maze tasks. I am wondering if the authors have any thoughts about what is really happening at the postsynaptic level following adolescent dopamine stimulation.

      The developmental increases in dopaminergic innervation to the frontal cortex and local GABAergic transmission are likely synergistic processes, which both contribute to the maturation of high-order cognitive functions supported by the frontal cortex (Caballero and Tseng 2016, Larsen and Luna 2018). Previous electrophysiological studies have suggested that dopamine can act on five different receptors expressed in both excitatory and inhibitory postsynaptic neurons (Seamans and Yang 2004, Tseng and O'Donnell 2007, O'Donnell 2010). At the network level, dopaminergic signaling can increase the signal-to-noise ratio and temporal synchrony of neural activity during cognitive tasks (Rolls, Loh et al. 2008, Vander Weele, Siciliano et al. 2018, Lohani, Martig et al. 2019). As the frontal GABAergic inhibitory network undergoes major functional remodeling during adolescence (Caballero and Tseng 2016), adolescent stimulation of dopamine neurons may interact with this maturational process to promote a network configuration conducive for synchronous and high signal-to-noise neural computation (Porter, Rizzo et al. 1999, Murty, Calabro et al. 2016, Mukherjee, Carvalho et al. 2019). The microcircuit mechanisms underlying adolescent dopamine stimulation induced changes, particularly in the GABAergic inhibitory neurons, will be an exciting direction for future research. We have extended our discussion about these points in the revised manuscript.

      5). A change in the density of dopamine boutons is unlikely to be limited to the M2 region in Arc-/- mice. The authors should provide some data illustrating that similar changes are widespread across the medial prefrontal cortex, and that the optical recording in the M2 region was preferred for technical limitations and to avoid damaging areas in the frontal cortex.

      As discussed above, this study focused on the M2 region of the frontal cortex because it is functionally required for memory-guided Y-maze navigation, generates behavioral choice-related neural signals during spatial navigation, and is optically most accessible. The medial prefrontal regions (anterior cingulate, prelimbic and infralimbic) ventral to M2 also receive dense dopaminergic innervation and can act in concert with M2 in decision making (Sul, Kim et al. 2010, Sul, Jo et al. 2011, Barthas and Kwan 2017). As dopaminergic innervations to the frontal cortical regions progress in a ventral-to-dorsal direction during development (Kalsbeek, Voorn et al. 1988, Hoops and Flores 2017), how the changes induced by adolescent dopamine stimulation may proceed spatial-temporally across different frontal subregions requires more extensive investigation in the future. We have added this discussion into the revised manuscript.

      Reviewer #2 (Public Review):

      The manuscript by Mastwal and colleagues explores how transient adolescent stimulation of ventral midbrain neurons that project to the frontal cortex may help to improve performance on certain memory tasks. The manuscript provides an interesting set of observations that DREADD-based activation over only 3 days during adolescence provides a fast-acting and long-lasting improvement in performance on Y-maze spontaneous alternation as well as aspects of neuronal function as assessed using in vivo imaging methods. While interesting, there are several weaknesses. First and foremost, it is not clear that the effects the authors are observing are mediated by dopamine. It has been clearly documented that the DAT-Cre line provides a better representation of midbrain dopamine cells in the mouse, particularly near the midline of the ventral midbrain (Lammel et al., Neuron 2015). This is precisely where the cells that project to the frontal cortex are located. Therefore, the selection of TH-Cre is problematic. It is very likely that the authors are labeling a substantial number of non-dopaminergic cells.

      We agree with Review 2 that the DAT-Cre line can provide specific labeling of midbrain dopamine neurons, particularly those projecting to the striatum, as discussed in the cited study (Lammel, Steinberg et al. 2015). DAT transports the extracellularly released dopamine back into presynaptic terminals, but it is not essential for dopamine synthesis and release (Sulzer, Cragg et al. 2016). Mesocortical dopamine neurons in the ventral tegmental area (VTA) express very little DAT (Sesack, Hawrylak et al. 1998, Lammel, Hetzel et al. 2008, Li, Qi et al. 2013), which limits the use of the DAT-Cre line to target these neurons (Lammel, Steinberg et al. 2015). Because mesocortical dopamine neurons have strong expression of TH, a key enzyme involved in dopamine synthesis, TH-Cre lines have been extensively used to study the mesocortical pathway (Lammel, Lim et al. 2012, Gunaydin, Grosenick et al. 2014, Ellwood, Patel et al. 2017, Vander Weele, Siciliano et al. 2018, Lohani, Martig et al. 2019). We provide more details below about our rationales for using TH-Cre rather than DAT-Cre mice in our study and the revisions we made in response to the reviewer’s specific recommendations.

      Reviewer #2 (Recommendations For The Authors):

      1). The authors should rigorously demonstrate that there is a reasonable midbrain DA projection to the coordinates that they are assessing and that their effects are due to DA release from these cells. It is not clear that there is a VTA dopaminergic projection to M2 - it does not appear for example in the Allen Mouse Brain Connectivity Atlas (https://connectivity.brainmap.org/projection/experiment/siv/160540751? imageId=160541123&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=17321&y=15284&z=3). Though there is a projection to the mPFC, at the coordinates the authors report, there does not appear to be any signal from DAT-Cre mice. However, there is much more signal when expression is not restricted to dopamine cells (https://connectivity.brain-map.org/projection/experiment/siv/165975096? imageId=165975158&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=17950&y=11504&z=3). The argument that these cells may express less TH is not relevant for this particular issue. Therefore, it is possible that the vast majority of observed effects are not in fact mediated by dopamine but another neurotransmitter such as glutamate. While the experiment using SCH23390 does suggest DA receptors may be involved, this result in isolation doesn't alleviate this caveat - there can be, for example, DA release from NE cells (e.g., Takeuchi et al., Nature 2016). While this does not entirely invalidate the authors' results, as their effects of stimulation of ventral midbrain cells to the forebrain don't necessarily have to occur via dopamine - the mechanism by how this is occurring needs to be clear.

      While the prelimbic area has the highest level of dopaminergic terminals among frontal cortical regions, a robust presence of midbrain dopaminergic projections and dopamine release in the M2 frontal cortex have been well established by immunostaining, viral labeling, single-cell axon-tracing, and in vivo imaging of recently developed dopamine biosensors (Berger, Gaspar et al. 1991, Mastwal, Ye et al. 2014, Aransay, Rodriguez-Lopez et al. 2015, Ye, Mastwal et al. 2017, Patriarchi, Cho et al. 2018). It has also been reported repeatedly that mesocortical dopamine neurons in the VTA express very little DAT, which is different from mesostriatal dopamine neurons (Sesack, Hawrylak et al. 1998, Lammel, Hetzel et al. 2008, Li, Qi et al. 2013). This limitation in the use of the DAT-Cre line to target mesocortical dopamine neurons has been acknowledged in previous studies (Lammel, Steinberg et al. 2015) and is consistent with the reviewer’s observation of DAT-Cre labeling in the Allen Brain Mouse Connectivity atlas. Additionally, and interestingly, recent extensive evaluation of the DAT-Cre line reported ectopic labeling of multiple non-dopaminergic neuronal populations (Soden, Miller et al. 2016, Stagkourakis, Spigolon et al. 2018, Papathanou, Dumas et al. 2019). Our own evaluation of the DAT-Cre line’s utility for cortical imaging also revealed sparse axonal labeling and sporadic ectopic labeling of cortical cell somas. We have included representative DAT-Cre images in Author response image 1 to highlight the limitations of this line in the study of the dopaminergic mesocortical circuit.

      Author response image 1.

      Example images from DAT-Cre/Ai14 mice. Left most panel shows little axonal labeling in Layer 5/6 of M2. The center panel shows sparse axonal label in Layer 1/2 of M2, but also ectopic labeling of cell soma. The right panel shows a lack of labeling in L1/2 of prelimbic cortex as well. Scale bars 50um.

      We as well as others have confirmed that TH immunoreactivity in the frontal cortex can label dopaminergic axons originated from the VTA, and ablation of VTA dopaminergic neurons removes this labeling (Niwa, Jaaro-Peled et al. 2013, Ye, Mastwal et al. 2017). Because mesocortical dopamine neurons have much stronger TH expression than DAT expression (Sesack, Hawrylak et al. 1998, Lammel, Hetzel et al. 2008, Li, Qi et al. 2013, Lammel, Steinberg et al. 2015), TH-Cre lines have been frequently used to label these neurons and study the mesocortical pathway (Lammel, Lim et al. 2012, Gunaydin, Grosenick et al. 2014, Ellwood, Patel et al. 2017, Vander Weele, Siciliano et al. 2018, Lohani, Martig et al. 2019). While TH-Cre expression itself is not restricted to dopaminergic neurons, we targeted our viral injections to the VTA and optogenetic stimulation to the cortical dopaminergic projection target area in M2 (Patriarchi, Cho et al. 2018) to specifically modulate mesofrontal dopaminergic axons. In addition, we tested D1 antagonist’s effects in our manipulations. Although we targeted dopamine neurons in our adolescent stimulation, the final behavioral outcome likely includes contributions from co-released neurotransmitters such as glutamate and non-dopaminergic neurons via network effects (Morales and Margolis 2017, Lohani, Martig et al. 2019), which will be interesting directions for future research. We have revised our results and discussion sections to highlight our rationales for using the TH-Cre line and the open mechanistic questions for future studies.

      2) SSFOs don't increase excitability like DREADDs, but rather, cause long-lasting hyperactivity through continuous passage of cations. What the actual firing properties are of these cells over a long period of time is not clear.

      We did not measure the precise firing patterns of the dopaminergic neurons targeted by SSFOs but evaluated the effects of SSFO activation on the frontal cortex. Similar to our DREADD-Gq mediated activity changes in the mesofrontal circuit, we found increased frontal cortical activity post-light stimulation of frontal dopamine axons in our SSFO treated animals (Fig 6a-c, S6e). While quantitatively the firing patterns of DREADD-Gq and SSFO activated dopaminergic neurons likely differ, qualitatively both of these manipulations lead to increased mesofrontal circuit activity and improvements in cognitive behaviors. In our previous work with wild-type adolescent mice, both wheel running and a single 10-min session of phasic optogenetic stimulation of the VTA resulted in dopaminergic bouton outgrowth in the frontal cortex (Mastwal, Ye et al. 2014). Taken together, these results suggest that adolescent dopaminergic mesofrontal projections are highly responsive to neural activity changes and a variety of adolescent stimulation paradigms are sufficient to elicit lasting changes in this circuit. We have added this discussion of the limitations and implications of our study into the revised manuscript.

      3) It is not clear what the increase in boutons means, given that DA release is thought to largely occur via non-synaptic release.

      Although many of dopamine boutons are not associated with defined postsynaptic structures, these axonal boutons and the active zones they contain are the major release sites for dopamine (Goldman-Rakic, Leranth et al. 1989, Arbuthnott and Wickens 2007, Sulzer, Cragg et al. 2016, Liu, Goel et al. 2021). Past studies have established a consistent association between increased dopaminergic innervation in the frontal cortex and an increase in dopamine levels (Niwa, Kamiya et al. 2010, Naneix, Marchand et al. 2012). Our previous work also found that increasing dopaminergic boutons through adolescent VTA stimulation led to prolonged frontal local field potential responses with high-frequency oscillations (Mastwal, Ye et al. 2014), which is characteristic of increased dopaminergic signaling (Lewis and O'Donnell 2000, Gireesh and Plenz 2008, Wood, Kim et al. 2012, Lohani, Martig et al. 2019). Importantly, in our quantification of the structural changes in this study, we evaluated boutons which were labeled with synaptophysin, a molecular marker indicating the presence of synaptic vesicle release machinery (Li, Tasic et al. 2010, Oh, Harris et al. 2014). Thus, our study, taken in the context of the previous work, suggests the increased number of boutons signifying an increase in dopaminergic signaling within the mesofrontal circuit. We have added this discussion into the revised manuscript.

      4) The use of Arc and DISC mutants as models of schizophrenia is perhaps a bit overstated - while deficits in prefrontal innervation certainly occur, there are many differences between these models and the human disease states. Language should be toned down accordingly, particularly in the introduction.

      We strived to avoid overstating the extent to which the mouse lines are models for specific diseases, but we can appreciate that this may not have been clear in our original writing. We have adjusted our language to better distinguish between the utility of the animal models for the purposes of our study and their relationship to specific human disease states. Particularly in the introduction, we stated that: “Genetic disruptions of several genes involved in synaptic functions related to psychiatric disorders, such as Arc and DISC1, lead to hypoactive mesofrontal dopaminergic input in mice (Niwa, Kamiya et al. 2010, Niwa, Jaaro-Peled et al. 2013, Fromer, Pocklington et al. 2014, Purcell, Moran et al. 2014, Wen, Nguyen et al. 2014, Manago, Mereu et al. 2016). Although there are many differences between these mouse lines and specific human disease states, these mice offer opportunities to test whether genetic deficits in frontal cortex function can be reversed through circuit interventions.”

      5) Some experiments are missing proper controls, e.g., Figure 3g-I where a WT mouse should be used as a positive control.

      The goal of this experimental design (Fig 3g-i) was to evaluate the potential effects of chemogenetic VTA stimulation in the Arc-/- mice. We used Arc-/- mice with mCherry injections to control for the potential effects of CNO administration. While WT mice could be used to determine if adolescent VTA stimulation would lead to long-lasting enhancement of VTA-to-Cortical transmission, this wouldn’t necessarily be a positive control for our experiments, but rather a separate line of inquiry. As dopamine’s effects often display an inverted-U dose-response curve (Vijayraghavan, Wang et al. 2007, Floresco 2013), evaluating the effects adolescent VTA stimulation in the absence of underlying dopamine deficiency could be an interesting future research direction. We have added this discussion into the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      1) Did the SSFO stimulation of the TH+ axons in PFC during adolescence lead to the same long-term change in DA bouton number the authors saw with DREADDs?

      We did not examine the degree of bouton growth in the SSFO cohort, which is a limitation of this study. Accurate quantification of dopamine boutons requires the co-injection of another AAV vector encoding Synaptophysin-GFP to label the boutons. Because we used light to directly stimulate SSFO-labeled dopaminergic axons in the frontal cortex, we were concerned that co-injecting another AAV vector may dilute SSFO-labeling of axons and reduce the efficacy of optogenetic stimulation. Given the behavioral benefits we observed, we would expect an increase in bouton density after optogenetic stimulation. A systematic optimization of viral co-labeling and optogenetic stimulation protocols will facilitate examination of the impact of SSFO stimulation at the structural level in future studies. We have added a discussion of the limitation of this study in the revised manuscript.

      2) The DISC1 section is far less detailed than the Arc section, and it was not completely clear to me that the mechanisms of dysfunction and rescue were the same in these mice compared with the Arc mice. For example, there was no mention of DA bouton density or the patterned firing of the PFC neurons at the time of decision making.

      The initial motivation of this study was to test if adolescent dopamine stimulation can rescue the deficits in the mesofrontal dopaminergic circuit and cognitive function of Arc-/- mice, which were identified in our previous studies (Manago, Mereu et al. 2016). We first conducted multiple levels of analyses including viral tracing, in vivo calcium imaging, and behavioral tests to establish the coherent impacts of adolescent dopamine neuron stimulation on circuits and behaviors. We then examined a range of stimulation protocols to assess the efficacy requirements for cognitive improvement, which is our primary goal. Finally, we included DISC1 mice in our study to test if adolescent dopamine stimulation can also reverse the cognitive deficit in another genetic model for mesofrontal dopamine deficiency. By demonstrating a similar cognitive recuse effect of adolescent VTA stimulation in an independent mouse model, this study provides a foundation for future research to compare the detailed cellular mechanisms that underlie the functional rescue in different genetic models. We have added the discussion of the scope and limitation of this study to the revised manuscript.

      References

      Aransay, A., C. Rodriguez-Lopez, M. Garcia-Amado, F. Clasca and L. Prensa (2015). "Long-range projection neurons of the mouse ventral tegmental area: a single-cell axon tracing analysis." Front Neuroanat 9: 59.

      Arbuthnott, G. W. and J. Wickens (2007). "Space, time and dopamine." Trends Neurosci 30(2): 62-69.

      Arnsten, A. F., J. X. Cai, B. L. Murphy and P. S. Goldman-Rakic (1994). "Dopamine D1 receptor mechanisms in the cognitive performance of young adult and aged monkeys." Psychopharmacology (Berl) 116(2): 143-151.

      Barthas, F. and A. C. Kwan (2017). "Secondary motor cortex: where ‘sensory’meets ‘motor’in the rodent frontal cortex." Trends in neurosciences 40(3): 181-193.

      Berger, B., P. Gaspar and C. Verney (1991). "Dopaminergic innervation of the cerebral cortex: unexpected differences between rodents and primates." Trends Neurosci 14(1): 21-27.

      Caballero, A. and K. Y. Tseng (2016). "GABAergic Function as a Limiting Factor for Prefrontal Maturation during Adolescence." Trends Neurosci 39(7): 441-448.

      Ellwood, I. T., T. Patel, V. Wadia, A. T. Lee, A. T. Liptak, K. J. Bender and V. S. Sohal (2017). "Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies." J Neurosci 37(35): 8315-8329.

      Floresco, S. B. (2013). "Prefrontal dopamine and behavioral flexibility: shifting from an "inverted-U" toward a family of functions." Front Neurosci 7: 62.

      Fromer, M., A. J. Pocklington, D. H. Kavanagh, H. J. Williams, S. Dwyer, P. Gormley, L. Georgieva, E. Rees, P. Palta, D. M. Ruderfer, N. Carrera, I. Humphreys, J. S. Johnson, P. Roussos, D. D. Barker, E. Banks, V. Milanova, S. G. Grant, E. Hannon, S. A. Rose, K. Chambert, M. Mahajan, E. M. Scolnick, J. L. Moran, G. Kirov, A. Palotie, S. A. McCarroll, P. Holmans, P. Sklar, M. J. Owen, S. M. Purcell and M. C. O'Donovan (2014). "De novo mutations in schizophrenia implicate synaptic networks." Nature 506(7487): 179-184.

      Gireesh, E. D. and D. Plenz (2008). "Neuronal avalanches organize as nested theta- and beta/gamma-oscillations during development of cortical layer 2/3." Proc Natl Acad Sci U S A 105(21): 7576-7581.

      Goldman-Rakic, P. S., C. Leranth, S. M. Williams, N. Mons and M. Geffard (1989). "Dopamine synaptic complex with pyramidal neurons in primate cerebral cortex." Proc Natl Acad Sci U S A 86(22): 9015-9019.

      Gunaydin, L. A., L. Grosenick, J. C. Finkelstein, I. V. Kauvar, L. E. Fenno, A. Adhikari, S. Lammel, J. J. Mirzabekov, R. D. Airan, K. A. Zalocusky, K. M. Tye, P. Anikeeva, R. C. Malenka and K. Deisseroth (2014). "Natural neural projection dynamics underlying social behavior." Cell 157(7): 1535-1551.

      Hoops, D. and C. Flores (2017). "Making Dopamine Connections in Adolescence." Trends Neurosci 40(12): 709-719.

      Kalsbeek, A., P. Voorn, R. M. Buijs, C. W. Pool and H. B. Uylings (1988). "Development of the dopaminergic innervation in the prefrontal cortex of the rat." J Comp Neurol 269(1): 58-72.

      Lammel, S., A. Hetzel, O. Hackel, I. Jones, B. Liss and J. Roeper (2008). "Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system." Neuron 57(5): 760-773.

      Lammel, S., A. Hetzel, O. Haeckel, I. Jones, B. Liss and J. Roeper (2008). "Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system." Neuron 57(5): 760-773.

      Lammel, S., B. K. Lim, C. Ran, K. W. Huang, M. J. Betley, K. M. Tye, K. Deisseroth and R. C. Malenka (2012). "Input-specific control of reward and aversion in the ventral tegmental area." Nature 491(7423): 212-217.

      Lammel, S., E. E. Steinberg, C. Foldy, N. R. Wall, K. Beier, L. Luo and R. C. Malenka (2015). "Diversity of transgenic mouse models for selective targeting of midbrain dopamine neurons." Neuron 85(2): 429-438.

      Larsen, B. and B. Luna (2018). "Adolescence as a neurobiological critical period for the development of higher-order cognition." Neurosci Biobehav Rev 94: 179-195.

      Lewis, B. L. and P. O'Donnell (2000). "Ventral tegmental area afferents to the prefrontal cortex maintain membrane potential 'up' states in pyramidal neurons via D(1) dopamine receptors." Cereb Cortex 10(12): 1168-1175.

      Li, L., B. Tasic, K. D. Micheva, V. M. Ivanov, M. L. Spletter, S. J. Smith and L. Luo (2010). "Visualizing the distribution of synapses from individual neurons in the mouse brain." PLoS One 5(7): e11503.

      Li, X., J. Qi, T. Yamaguchi, H. L. Wang and M. Morales (2013). "Heterogeneous composition of dopamine neurons of the rat A10 region: molecular evidence for diverse signaling properties." Brain Struct Funct 218(5): 1159-1176.

      Liu, C., P. Goel and P. S. Kaeser (2021). "Spatial and temporal scales of dopamine transmission." Nat Rev Neurosci 22(6): 345-358.

      Lohani, S., A. K. Martig, K. Deisseroth, I. B. Witten and B. Moghaddam (2019). "Dopamine Modulation of Prefrontal Cortex Activity Is Manifold and Operates at Multiple Temporal and Spatial Scales." Cell Rep 27(1): 99-114 e116.

      Manago, F., M. Mereu, S. Mastwal, R. Mastrogiacomo, D. Scheggia, M. Emanuele, M. A. De Luca, D. R. Weinberger, K. H. Wang and F. Papaleo (2016). "Genetic Disruption of Arc/Arg3.1 in Mice Causes Alterations in Dopamine and Neurobehavioral Phenotypes Related to Schizophrenia." Cell Rep 16(8): 2116-2128.

      Mastwal, S., Y. Ye, M. Ren, D. V. Jimenez, K. Martinowich, C. R. Gerfen and K. H. Wang (2014). "Phasic dopamine neuron activity elicits unique mesofrontal plasticity in adolescence." J Neurosci 34(29): 9484-9496.

      Morales, M. and E. B. Margolis (2017). "Ventral tegmental area: cellular heterogeneity, connectivity and behaviour." Nat Rev Neurosci 18(2): 73-85.

      Mukherjee, A., F. Carvalho, S. Eliez and P. Caroni (2019). "Long-Lasting Rescue of Network and Cognitive Dysfunction in a Genetic Schizophrenia Model." Cell 178(6): 1387-1402 e1314. Murty, V. P., F. Calabro and B. Luna (2016). "The role of experience in adolescent cognitive development: Integration of executive, memory, and mesolimbic systems." Neurosci Biobehav Rev 70: 46-58.

      Naneix, F., A. R. Marchand, G. Di Scala, J. R. Pape and E. Coutureau (2012). "Parallel maturation of goal-directed behavior and dopaminergic systems during adolescence." J Neurosci 32(46): 16223-16232.

      Niwa, M., H. Jaaro-Peled, S. Tankou, S. Seshadri, T. Hikida, Y. Matsumoto, N. G. Cascella, S. Kano, N. Ozaki, T. Nabeshima and A. Sawa (2013). "Adolescent stress-induced epigenetic control of dopaminergic neurons via glucocorticoids." Science 339(6117): 335-339.

      Niwa, M., A. Kamiya, R. Murai, K. Kubo, A. J. Gruber, K. Tomita, L. Lu, S. Tomisato, H. Jaaro-Peled, S. Seshadri, H. Hiyama, B. Huang, K. Kohda, Y. Noda, P. O'Donnell, K. Nakajima, A. Sawa and T. Nabeshima (2010). "Knockdown of DISC1 by in utero gene transfer disturbs postnatal dopaminergic maturation in the frontal cortex and leads to adult behavioral deficits." Neuron 65(4): 480-489.

      O'Donnell, P. (2010). "Adolescent maturation of cortical dopamine." Neurotox Res 18(3-4): 306-312.

      Oh, S. W., J. A. Harris, L. Ng, B. Winslow, N. Cain, S. Mihalas, Q. Wang, C. Lau, L. Kuan, A. M. Henry, M. T. Mortrud, B. Ouellette, T. N. Nguyen, S. A. Sorensen, C. R. Slaughterbeck, W. Wakeman, Y. Li, D. Feng, A. Ho, E. Nicholas, K. E. Hirokawa, P. Bohn, K. M. Joines, H. Peng, M. J. Hawrylycz, J. W. Phillips, J. G. Hohmann, P. Wohnoutka, C. R. Gerfen, C. Koch, A. Bernard, C. Dang, A. R. Jones and H. Zeng (2014). "A mesoscale connectome of the mouse brain." Nature 508(7495): 207-214.

      Papathanou, M., S. Dumas, H. Pettersson, L. Olson and A. Wallen-Mackenzie (2019). "Off-Target Effects in Transgenic Mice: Characterization of Dopamine Transporter (DAT)-Cre Transgenic Mouse Lines Exposes Multiple Non-Dopaminergic Neuronal Clusters Available for Selective Targeting within Limbic Neurocircuitry." eNeuro 6(5).

      Patriarchi, T., J. R. Cho, K. Merten, M. W. Howe, A. Marley, W. H. Xiong, R. W. Folk, G. J. Broussard, R. Liang, M. J. Jang, H. Zhong, D. Dombeck, M. von Zastrow, A. Nimmerjahn, V. Gradinaru, J. T. Williams and L. Tian (2018). "Ultrafast neuronal imaging of dopamine dynamics with designed genetically encoded sensors." Science 360(6396): 1420-+.

      Porter, L. L., E. Rizzo and J. P. Hornung (1999). "Dopamine affects parvalbumin expression during cortical development in vitro." J Neurosci 19(20): 8990-9003.

      Purcell, S. M., J. L. Moran, M. Fromer, D. Ruderfer, N. Solovieff, P. Roussos, C. O'Dushlaine, K. Chambert, S. E. Bergen, A. Kahler, L. Duncan, E. Stahl, G. Genovese, E. Fernandez, M. O. Collins, N. H. Komiyama, J. S. Choudhary, P. K. Magnusson, E. Banks, K. Shakir, K. Garimella, T. Fennell, M. DePristo, S. G. Grant, S. J. Haggarty, S. Gabriel, E. M. Scolnick, E. S. Lander, C. M. Hultman, P. F. Sullivan, S. A. McCarroll and P. Sklar (2014). "A polygenic burden of rare disruptive mutations in schizophrenia." Nature 506(7487): 185-190.

      Robbins, T. W. (2000). "Chemical neuromodulation of frontal-executive functions in humans and other animals." Exp Brain Res 133(1): 130-138.

      Rolls, E. T., M. Loh, G. Deco and G. Winterer (2008). "Computational models of schizophrenia and dopamine modulation in the prefrontal cortex." Nat Rev Neurosci 9(9): 696-709.

      Seamans, J. K. and C. R. Yang (2004). "The principal features and mechanisms of dopamine modulation in the prefrontal cortex." Prog Neurobiol 74(1): 1-58.

      Sesack, S. R., V. A. Hawrylak, C. Matus, M. A. Guido and A. I. Levey (1998). "Dopamine axon varicosities in the prelimbic division of the rat prefrontal cortex exhibit sparse immunoreactivity for the dopamine transporter." J Neurosci 18(7): 2697-2708.

      Soden, M. E., S. M. Miller, L. M. Burgeno, P. E. M. Phillips, T. S. Hnasko and L. S. Zweifel (2016). "Genetic Isolation of Hypothalamic Neurons that Regulate Context-Specific Male Social Behavior." Cell Rep 16(2): 304-313.

      Spear, L. (2000). "Modeling adolescent development and alcohol use in animals." Alcohol Res Health 24(2): 115-123.

      Stagkourakis, S., G. Spigolon, P. Williams, J. Protzmann, G. Fisone and C. Broberger (2018). "A neural network for intermale aggression to establish social hierarchy." Nat Neurosci 21(6): 834-842. Sul, J. H., S. Jo, D. Lee and M. W. Jung (2011). "Role of rodent secondary motor cortex in value-based action selection." Nat Neurosci 14(9): 1202-1208.

      Sul, J. H., H. Kim, N. Huh, D. Lee and M. W. Jung (2010). "Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making." Neuron 66(3): 449-460.

      Sulzer, D., S. J. Cragg and M. E. Rice (2016). "Striatal dopamine neurotransmission: regulation of release and uptake." Basal Ganglia 6(3): 123-148.

      Tseng, K. Y. and P. O'Donnell (2007). "Dopamine modulation of prefrontal cortical interneurons changes during adolescence." Cereb Cortex 17(5): 1235-1240.

      Vander Weele, C. M., C. A. Siciliano, G. A. Matthews, P. Namburi, E. M. Izadmehr, I. C. Espinel, E. H. Nieh, E. H. S. Schut, N. Padilla-Coreano, A. Burgos-Robles, C. J. Chang, E. Y. Kimchi, A. Beyeler, R. Wichmann, C. P. Wildes and K. M. Tye (2018). "Dopamine enhances signal-to-noise ratio in cortical-brainstem encoding of aversive stimuli." Nature 563(7731): 397-401.

      Vijayraghavan, S., M. Wang, S. G. Birnbaum, G. V. Williams and A. F. Arnsten (2007). "Inverted-U dopamine D1 receptor actions on prefrontal neurons engaged in working memory." Nat Neurosci 10(3): 376-384.

      Wen, Z., H. N. Nguyen, Z. Guo, M. A. Lalli, X. Wang, Y. Su, N. S. Kim, K. J. Yoon, J. Shin, C. Zhang, G. Makri, D. Nauen, H. Yu, E. Guzman, C. H. Chiang, N. Yoritomo, K. Kaibuchi, J. Zou, K. M. Christian, L. Cheng, C. A. Ross, R. L. Margolis, G. Chen, K. S. Kosik, H. Song and G. L. Ming (2014). "Synaptic dysregulation in a human iPS cell model of mental disorders." Nature 515(7527): 414-418.

      Wood, J., Y. Kim and B. Moghaddam (2012). "Disruption of prefrontal cortex large scale neuronal activity by different classes of psychotomimetic drugs." J Neurosci 32(9): 3022-3031.

      Ye, Y., S. Mastwal, V. Y. Cao, M. Ren, Q. Liu, W. Zhang, A. G. Elkahloun and K. H. Wang (2017). "Dopamine is Required for Activity-Dependent Amplification of Arc mRNA in Developing Postnatal Frontal Cortex." Cereb Cortex 27(7): 3600-3608.

    1. Author Response

      We thank the editors for their care in handling our manuscript. We also thank the reviewers, especially reviewer 2, for their thorough comments. We will work to address their concerns in a revised version and provide some initial comments below.

      A major concern of two reviewers was that odour profiles were not quantified rigorously. We acknowledge that our study does not achieve the level of quantitative rigour standard in most chemical ecology work. We plan to conduct a few additional analyses to help address this shortcoming. We will also adjust the text to clarify the semi-quantitative nature of the data.

      Reviewers also suggested using several different analytical approaches (e.g., different column, different sorbent) to broaden the type and number of detectable compounds. The reviewers rightly point out that such choices strongly affect which compounds we are likely to sample. No single approach is comprehensive, and ours is no exception. We will work to ensure that the appropriate caveats are included prominently in the text.

      However, we believe this concern in fact underscores a special strength of our study: analysing the odour of a large number of species in a single study using the same analytical approach, so that the inherent biases of different approaches do not complicate cross-species comparisons. We are aware of very few such large-scale studies in any system and welcome suggestions from reviewers or readers of any we might have overlooked.

      In general, we believe many of the reviewers’ methodological concerns reflect standards in the field of chemical ecology established for studies that aim to describe the odour of one or a few species as comprehensively as possible with a high level of quantitative rigour. This was not our goal, and we will temper our language in the revised paper to make that clear. Instead, we aimed to sample as broadly as possible across species to gain insight into the general statistics of a large 'odour landscape' or 'odour space' — an endeavour that, to our knowledge, is less common in the chemical ecology literature. In doing so, we prioritized breadth over depth. We believe the resulting dataset provides solid evidence for our major conclusions, though we will revisit our analyses and conduct a small number of additional experiments to further substantiate our claims.

    1. Author Response:

      We thank the reviewers and editors for their constructive and encouraging feedback on our manuscript. We have carefully studied the reviewer comments and found that we agree with almost all of them; we will implement these suggestions and prepare a revised submission. In particular, we will aim to address the reviewers’ valid concerns regarding metagenomic detection limits via a high-sensitivity re-analysis of the data based on metagenomic read mapping, orthogonal to our current analyses based on read mapping to mOTU single copy marker genes. Moreover, we will revise the manuscript text for clarity and streamline the phrasing on some observations and claims. We are confident that our work will improve as a result and look forward to future feedback and interactions.

      Sincerely, for the authors,

      Sebastian Schmidt & Peer Bork

    1. Author Response

      Joint Public review

      The manuscript by Mitra and coworkers analyses the functional role of Orai in the excitability of central dopaminergic neurons in Drosophila. The authors show that a dominant-negative mutant of Orai (OraiE180A) significantly alters the gene expression profile of flight-promoting dopaminergic neurons (fpDANs). Among them, OraiE180A attenuates the expression of Set2 and enhances that of E(z) shifting the level of epigenetic signatures that modulate gene expression. The present results also demonstrate that Set2 expression via Orai involves the transcription factor Trl. The Orai-Trl-Set1 pathway modulates the expression of VGCC, which, in turn, are involved in dopamine release. The topic investigated is interesting and timely and the study is carefully performed and technically sound; however, there are several major concerns that need to be addressed:

      1) In Figure S2E, STIM is overexpressed in the absence of Set2 and this leads to rescue. It is presumed that STIM overexpression causes excess SOCE, yet this is rarely the case. Perhaps the bigger concern, however, is how excess SOCE might overcome the loss of SET2 if SET2 mediates SOCE-induced development of flight. These data are more consistent with something other than SET2 mediating this function.

      Our statement that STIM overexpression overcomes deficits in SOCE is based on the following published work:

      1. Studies of SOCE in wildtype cultured larval Drosophila neurons demonstrated that overexpression of STIM raised SOCE to the same extent as co-expression of STIM and Orai in the WT background (Chakraborty et al, 2016; Figure 1D).

      2. Both Carbachol-induced IP3-mediated Ca2+ release and SOCE (measured by Ca2+ add back after Thapsigargin-induced store depletion) were rescued in primary cultures of IP3R hypomorphic mutant (itprku) Drosophila neurons by overexpression of STIM (Agrawal et al., 2010; Figure 8A-G).

      3. Deb et al., 2016 (Supplementary Figure 2h,i) reaffirmed that overexpression of STIM significantly improves SOCE after Thapsigargin-induced passive store-depletion in Drosophila neurons expressing IP3RRNAi.

      4. Consistent with the cellular rescue of SOCE, defects in flight initiation and physiology observed in the heteroallelic IP3R hypomorphic background (itprku) could be rescued by overexpression of STIM (Agrawal et al., 2010; Figure 3A-E) as well as Orai (Venkiteswaran and Hasan, 2009; Figure 3).

      5. In Figure S2E, we show that flight deficits arising from THD’> Set2RNAi are rescued upon overexpression of STIM (i.e. THD’>Set2RNAi; STIMOE). Here and in another recent publication (Mitra et al., 2021) we show that neurons expressing Set2RNAi exhibit reduced expression of the IP3R and reduced ER-Ca2+ release presumably leading to reduced SOCE. As mentioned above we have consistently found that STIM overexpression raises both IP3-mediated Ca2+ release and SOCE in Drosophila neurons.

      In this study, we propose that Ca2+ release through the IP3R followed by SOCE are part of a positive feedback loop driving expression of Set2 which in turn upregulates expression of mAChR and IP3R (Figure 3F) to regulate dopaminergic neuron function. Our observation that loss of Set2 (THD’>Set2RNAi) can be rescued by STIM overexpression is consistent with this model because:

      1. Loss of Set2 (THD’>Set2RNAi) results in downregulation of several genes including mAChR and IP3R leading to decreased SOCE.

      2. As evident from our previous studies increased STIM expression in the Set2RNAi background (THD’>Set2RNAi; STIMOE) is expected to enhance SOCE which we predict would rescue Set2 expression leading to rescue of other Set2 dependent downstream functions like flight (Figure 2D).

      2) In Figure 3, data is provided linking SET2 expression and Cch-induced Ca2+ responses. The presentation of these data is confusing. In addition, the results may be a simple side effect of SET2-dependent expression of IP3R. Given that this article is about SOCE, why isn't SOCE shown here? More generally, there are no measurements of SOCE in this entire article. Measuring SOCE (not what is measured in response to Cch) could help eliminate some of this confusion.

      We will re-write this section in the revised version for better clarity and explain how Set2-dependent IP3R expression is an important component of Orai-mediated Ca2+ entry in fpDANs. Here, we propose that IP3-mediated Ca2+ release and SOCE, through Orai, are together part of a positive feedback loop driving transcription of Set2 which in turn upregulates mAChR and IP3R expression (Figure 3F). We hypothesized that the observed loss of CCh-induced Ca2+ response in the Set2RNAi background (Figure 3B-D; THD’>Set2RNAi) results from decreased itpr and mAChR expression and verified this in Figure 3E. This is further validated by the rescue of CCh-induced Ca2+ response and itpr/mAChR expression in the OraiE180A background upon Set2 overexpression (Figure 3B-E; THD’>OraiE180A; Set2OE). We were constrained to measure CCh-induced Ca2+ responses in OraiE180A expressing neurons for the following reasons:

      1. SOCE measurements through Tg mediated store Ca2+ release followed by Ca2+ add back require a 0 Ca2+ environment that can only be achieved in culture. The Drosophila brain is bathed in hemolymph which contains Ca2+ and there do not exist any methods to readily deplete Ca2+ from the tissue to create a 0 Ca2+ environment without also effecting the health of the neurons.

      2. Cultures of the subset of dopaminergic neurons (THD’) we have focused on in this study were not feasible due to the small number of neurons being studied from the total number of dopaminergic neurons in the brain (~35/400). In previous studies we have shown that SOCE post-Tg induced store depletion is abrogated in cultured dopaminergic neurons from Drosophila upon expression of OraiE180A (Pathak et al., 2015).

      Furthermore, Carbachol-induced IP3-mediated Ca2+ release is tightly coupled to SOCE in Drosophila neurons (Venkiteswaran and Hasan, 2009) and Ca2+ release from the IP3R is physiologically relevant for flight behavior in THD’ neurons (Sharma and Hasan, 2020).

      3) A significant gap in the study relates to the conclusion that trl is a SOCE-regulated transcription factor. This conclusion is entirely based on genetic analysis of STIMKO heterozygous flies in which a copy of the trl13C hypomorph allele is introduced. While these results suggest a genetic interaction between the expression of the two genes, the evidence that expression translates into a functional interaction that places trl immediately downstream of SOCE is not rigorous or convincing. All that can be said is that the double mutant shows a defect in flight which could arise from an interruption of the circuit. Further, it is not clear whether the trl13C hypomorph is only introduced during the critical 72-96 hour time window when the Orai1E180E phenotype shows up. The same applies to the over-expression of Set2 and the other genes. If the expression is not temporally controlled, then the phenotype could be due to the blockade of an entirely different aspect of flight neuron function.

      The idea that Trl functions downstream of Orai-mediated Ca2+ entry in THD’ neurons is based on the following genetic evidence:

      1. In Figure 4D, we show evidence of genetic interaction between trl-STIM and trl-Set2. The rescue of trl13c/STIMKO with STIM overexpression in THD’ neurons indicates that excess SOCE (driven by STIMOE) may activate the residual Trl (there exists a WT Trl copy in this genetic background) to rescue THD’ flight function. This is further supported by the rescue of trl/STIMKO with Set2 overexpression in THD’ neurons, which is consistent with the feedback loop model proposed in Figure 5C - where we propose that reduced SOCE leads to reduced ‘activated’ Trl and thus reduced Set2 expression, and the latter is rescued by SET2OEThe manner in which SOCE ‘activates’ Trl is the subject of ongoing investigations.

      2. The trl hypomorphic alleles (including trl13C) exist as genetic mutants and they affect Trl function in all tissues throughout development. While we concede that these mutant alleles would affect multiple functions at other stages of development, which may impinge on the phenotypes noted in Figure S4B, we have used a targeted RNAi approach to validate Trl function specifically in the THD’ neurons (Figure 4C).

      3. Overexpression mediated rescues (including Set2) were not induced only during the critical 72-96 hrs APF developmental window. Having established that Orai function drives critical gene expression during this window (Figure 1), it is reasonable to assume that Set2 rescue of loss of flight in OraiE180A occurs in the same time window where flight is disrupted.

      4- In Figure 4, data is shown that SOCE compensates for the loss of Trl, the presumed mediator of SOCE-dependent flight. The fact that flight deficits are rescued by raising SOCE in the absence of Trl is very inconsistent with this conclusion.

      We apologise for this confusion and will clarify in the revision. trl13c is a recessive allele of Trl and should be written as such throughout the text and in the figures (i.e trl13c and NOT Trl13c). In all cases of Trl mutant rescue by STIMOE and Set2OE there exists residual Trl that can be activated by excess SOCE thus leading to the rescue. This is true for trl13C/ STIMKO where each mutant is present as a heterozygote (the complete genotype of this strain is STIMKO/+; trl13c/+; this will be corrected in the revision). Similarly, for TrlRNAi we expect reduced levels (but not complete loss) of Trl. Thus the SOCE rescue of loss of Trl occurs in conditions where Trl levels are reduced but NOT absent. Homozygous trl null mutants are lethal.

      5- In Figure 5 (A-C), data is provided that Trl transcripts are unaffected by loss of SOCE and that overexpression cannot rescue flightlessness. From this, the authors conclude that this gene "must" be calcium responsive. While that is one possibility, it is also possible that these genes are not functionally linked.

      The idea that Trl is functionally linked to SOCE is based on the following evidence:

      1. In Figure 4C we show that flight defects caused by partial loss of Trl (THD’>TrlRNAi) were rescued by STIM overexpression (THD’>TrlRNAi; STIMOE). As mentioned above we have found that STIM overexpression raises SOCE.

      2. Heteroalleles of the trl13C hypomorph exhibit a strong genetic interaction with a single copy of the null allele of STIMKO as shown by the flight deficit of trl13c/+; STIMKO/+ (trl13C/STIMKO ) flies (Figure 4D). The genotypes will be corrected in the revision.

      3. Flight defects in trl13C/STIMKO flies could be rescued by STIM overexpression in the THD’ neurons (trl13C/STIMKO; THD’>STIMOE)

      4. In Figure 4E, we show that partial loss of Trl in THD’ neurons (THD’>TrlRNAi) leads to decreased expression of the Ca2+ responsive genes mAChR, itpr, and Set2 genes indicating that Trl is a constituent of the SOCE-driven transcriptional feedback loop (Figure 5C).

      Since we could not detect a well-defined Ca2+ binding domain in Trl, we hypothesize that it could be activated by a Ca2+ dependent post-translational modification. Phosphoproteome analysis of Trl demonstrated that it does indeed undergo phosphorylation at a Threonine residue (T237; Zhai et al., 2008), which lies within a potential site for CaMKII. Independently, CaMKII has been identified as a binding partner of Trl from a Trl interactome study (Lomaev et al., 2018). Past work from our group (Ravi et al., 2018) identified a role for CaMKII in THD’ neurons in the context of flight. We are currently testing if CaMKII functions downstream of SOCE in THD’ neurons to mediate flight and will update this information in the next version of the manuscript.

      6) There is no characterization of SOCE in fpDANs from flies expressing native Orai or the dominant negative OraiE180A mutant. While the authors refer to previous studies, as the manuscript is essentially based on Orai function thapsigargin-induced SOCE should be tested using the Ca2+ add-back protocol in order to assess the release of Ca2+ from the ER in response to thapsigargin as well as the subsequent SOCE.

      The fpDANs consist of 16-19 neurons in each hemisphere (PPL1 are 10-12 and PPM3 are 6-7 cells; Pathak et al., 2015). Measuring SOCE from these neurons in vivo is not possible due to the presence of abundant extracellular Ca2+ in the brain. Given their sparse number, it proved technically challenging to isolate the fpDANs in culture to perform SOCE measurements using the Ca2+ add back protocol. Due to these reasons, we have relied upon using Carbachol to elicit IP3-mediated Ca2+ release and SOCE as a proxy for in vivo SOCE. In previous studies we have shown that Carbachol treatment of cultured Drosophila neurons elicits IP3-mediated Ca2+ release and SOCE (Agrawal et al., 2010; Figure 8). Moreover, expression of OraiE180A completely blocks SOCE as measured in primary cultures of dopaminergic neurons (Pathak et al., 2015; Figure 1E). Hence we have not repeated SOCE measurements from all dopaminergic neurons in this work. In the revised version we will explicitly state this weakness of our study and the reasons for it.

      7) In the experiments performed to rescue flight duration in Set2RNAi individuals the authors overexpress STIM and attribute the effect to "Excess STIM presumably drives higher SOCE sufficient to rescue flight bout durations caused by deficient Set2 levels.". This should be experimentally tested as the STIM:Orai stoichiometry has been demonstrated as essential for SOCE.

      The assumption that STIM overexpression drives higher SOCE is based upon previously published work from Drosophila neurons (Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al., 2016) which demonstrates that excess WT STIM overcomes IP3R deficiencies (RNAi or hypomorphic mutants) to rescue SOCE. We agree that STIM-Orai stoichiometry is essential for SOCE, and propose that the rescue backgrounds possess sufficient WT Orai, which is recruited by the excess STIM to mediate the rescue. We will reference the earlier work to validate our use of STIMOE for rescue of SOCE.

      Here, we propose that Set2 is part of a positive feedback loop driving transcription of mAChR and IP3R (Figure 3F). In keeping with this hypothesis, we posit that the phenotypes observed in the Set2RNAi background (Figure 2D) result from decreased itpr and mAChR expression (validated in Figure 3E). This is further validated by the Set2 overexpression mediated rescue of OraiE180A (Figure 2D) and rescue of itpr/mAChR expression in the OraiE180A background (Figure 3B-E; THD’>OraiE180A; Set2OE).

      8) The authors show that overexpression of OraiE108A results in Stim downregulation at a mRNA level. What about the protein level? And more important, how does OraiE108A downregulate Stim expression? Does it promote Stim degradation? Does it inhibit Stim expression?

      We hypothesize that changes in STIM mRNA observed in the THD’ > OraiE180A neurons stems from an overall reduction in IP3-mediated Ca2+ release and SOCE due to loss of Trl-Set2 driven gene expression detailed in our transcriptional feedback loop model (Figure 5C). We will attempt to explain this aspect more clearly in the next version of the manuscript. While we agree that measuring levels of STIM protein would be helpful, estimation of protein levels from a limited number of neurons (~35 cells per brain) is technically challenging. The STIM antibody does not work well in immunohistochemistry. In the absence of any experimental evidence we cannot comment on how expression of OraiE180A might affect STIM protein turnover.

      9) Lines 271-273, the authors state "whereas overexpression of a transgene encoding Set2 in THD' neurons either with loss of SOCE (OraiE180A) or with knockdown of the IP3R (itprRNAi), lead to significant rescue of the Ca2+ response". This is attributed to a positive effect of Set2 expression on IP3R expression and the authors show a positive correlation between these two parameters; however, there is no demonstration that Set2 expression can rescue IP3R expression in cells where the IP3R is knocked down (itprRNAi). This should be further demonstrated.

      The rescue of IP3R expression by Set2 overexpression in itprRNAi was demonstrated in a different set of Drosophila neurons in an earlier study (Mitra et al., 2021) and has not been repeated specifically in THD’ neurons. Similar to the previous study, here we tested CCh stimulated Ca2+ responses of THD’ neurons with itprRNAi and itprRNAi; SetOE (Fig S3), which are indeed rescued by SET2OE.

      10) The data presented in Figure 3E should be functionally demonstrated by analyzing the ability of CCh to release Ca2+ from the intracellular stores in the absence of extracellular Ca2+.

      CCh-mediated Ca2+ release from the intracellular stores in the absence of extracellular Ca2+ has been described in primary cultures of Drosophila neurons in previously published work (Venkiteswaran and Hasan, 2009; Agrawal et al., 2010) This work focuses on a set of 16-19 dopaminergic neurons in a hemisphere of the Drosophila central brain. It is technically challenging to generate a 0 Ca2+ environment in vivo, which is essential for measuring store Ca2+ release. Given their meagre numbers, primary cultures of these neurons is not readily feasible.

      11) The conclusion that SOCE regulates the neuronal excitability threshold is based entirely on either partial behavioral rescue of flight, or measurements of KCl-induced Ca2+ rises monitored by GCaMP6m in DAN neurons. The threshold for neuronal excitability is a precise parameter based on rheobase measurements of action potentials in current-clamp. Measurements of slow calcium signals using a slow dye such as GCaMp6m should not be equated with neuronal excitability. What is measured is a loss of the calcium response in high K depolarization experiments, which occurs due to the loss of expression of Cav channels. Hence, the use of this term is not accurate and will confuse readers. The use of terms referring to neuronal excitability needs to be changed throughout the manuscript. As such, the conclusions regarding neuronal excitability should be strongly tempered and the data reinterpreted as there are no true measurements of neuronal excitability in the manuscript. All that can be said is that expression of certain ion channel genes is suppressed. Since both Na+ channels and K+ channel expression is down-regulated, it is hard to say precisely how membrane excitability is altered without action potential analysis.

      The claim that SOCE influences neuronal excitability is based on the following observations:

      1. Interruption of the transcriptional feedback loop involving SOCE, Trl, and Set2 through loss of any of its constituents, results in the downregulation of VGCCs (Figure 5G, 6H), which are essential components of action potentials.

      2. OraiE180A mediated loss of SOCE in THD’ neurons abrogates the KCl-evoked depolarization response (Figure 6B, C) measured using GCaMP6m. We verified that this response requires VGCC function using pharmacological inhibition of L-type VGCCs (Figure 6E, F).

      3. SOCE deficient THD’ neurons, which were presumably compromised in their ability to evoke action potentials could be rescued to undergo KCl-evoked depolarisation by expression of NachBac, which lowers the depolarization threshold (Figure 7C, D) or through optogenetic stimulation using CsChrimson (Figure 7F).

      We agree that ‘neuronal excitability threshold’ is a precise electrophysiological parameter that has not been directly investigated here by measurement of action potentials. Therefore, references to neuronal excitability will be tempered throughout the revised manuscript and be replaced with a more generic reference to ‘neuronal activity’. In this context we propose to include further evidence supporting reduced excitability of THD’ neurons upon loss of SOCE in the revision.

      Since one of the key functional outcomes of activity during critical developmental periods such as the 72-96 hrs APF developmental window identified in this study, is remodelling of neuronal morphology, we decided to investigate the same in our context. Neuronal activity can drive changes in neurite complexity and axonal arborization (Depetris-Chauvin et al., 2011) especially during critical developmental periods (Sachse et al., 2007). To understand if Orai mediated Ca2+ entry and downstream gene expression through Set2 affects this activity-driven parameter, we investigated the morphology of fpDANs, and specifically measured the complexity of presynaptic terminals within the 2’1 lobe MB using super-resolution microscopy. We found striking changes in the neurite volume upon expression of OraiE180A which could be rescued by restoring either Set2 (OraiE180A; Set2OE) or by inducing hyperactivity through NachBac expression (OraiE180A ; NachBacOE). These data will be included in the revised manuscript.

      12) Related, since trl does not contain any molecular domains that could be regulated by Ca2+ signaling, it is unclear whether trl is directly regulated by SOCE or the regulation is highly indirect. Reporter assays evaluating trl activation upon Ca2+ rises would provide much stronger and more direct evidence for the conclusion that trl is a SOCE-regulated TF. As such the evidence is entirely based on RNAi downregulation of trl which indicates that trl is essential but has no bearing on exactly what point of the signaling cascade it is involved.

      We agree that luciferase Trl reporters would provide a direct method to test SOCE-mediated activation. Future investigations will be targeted in this direction. Regarding possible mechanisms of Trl activation - since we could not detect a well-defined Ca2+ binding domain in Trl, we hypothesize that it may be phosphorylation by a Ca2+ sensitive kinase. Phosphoproteome analysis of Trl indicates that it does indeed undergo phosphorylation at a Threonine reside (T237; Zhai et al., 2008), which may be mediated by the Ca2+ sensitive kinase-CaMKII based on binding partners identified in the Trl interactome (Lomaev et al., 2018). Past work (Ravi et al., 2018) has indeed demonstrated a requirement for CaMKII in THD’ neurons for flight. We are currently testing whether CaMKII functions downstream of SOCE in these neurons to mediate flight, and will be updating this information in the next version of the manuscript.

      13) Are NFAT levels altered in the Orai1 loss of function mutant? If not, this should be explicitly stated. It would seem based on previous literature that some gene regulation may be related to the downregulation of this established Ca2+-dependent transcription factor. Same for NFkb.

      As mentioned in the text in lines (307-309), Drosophila NFAT lacks a calcineurin binding site and is therefore not sensitive to Ca2+ (Keyser et al., 2007). In the past we tested if knockdown of NF-kB in dopaminergic neurons gave a flight phenotype and did not observe any measurable deficit. From the RNAseq data we find a slight downregulation of NFAT (0.49 fold, p value=0.048) and NF-kb (0.26 fold, p value =0.258) the significance of which is unclear at this point. We did not find any consensus binding sites for these two factors in the regulatory regions of downregulated genes from THD’ neurons.

      14) Does over-expression of Set2 restore ion channel expression especially those of the VGCCs? This would provide rigorous, direct evidence that SOCE-mediated regulation of VGCCs through Set2 controls voltage-gated calcium channel signaling.

      Set2 overexpression in the OraiE180A background indeed restores the expression of VGCC genes (Figure 6H).

      15) All 6 representative panels from Figure 3B are duplicated in Figure 4G. Likewise, 2 representative panels from Figure 5H are duplicated in Figure 6D. Although these panels all represent the results from control experiments, the relevant experiments were likely not conducted at the same time and under the same conditions. Thus, control images from other experiments should not be used simply because they correspond to controls. This situation should be clarified.

      We regret the confusion caused by the same representative images for the control experiments. These will be replaced by new representative images for Figure 5H in the next updated version of the manuscript.

      16) The figures are unusually busy and difficult to follow. In part this is because they usually have many panels (Fig. 1: A-I; Fig. 2, A-J, etc) but also because the arrangement of the panels is not consistent: sometimes the following panel is found to the right, other times it is below. It would help the reader to make the order of the panels consistent, and, if possible, reduce the number of panels and/or move some of the panels to new figures (eLife does not limit the number of display items).

      The image panels will be rearranged for ease of reading in the next updated version of the manuscript.

      17) As a final recommendation, the reviewers suggest that the authors a- Reword the text that refers to membrane excitability since membrane excitability was not directly measured here. b-Explain why STIM1 rescues the partial loss of flight in Set2 RNAi flies (Fig. S2E); and c- Explain how/why trl is calcium regulated and test using luciferase (or other) reporter assays whether Orai activation leads to trl activation.

      a. Textual references to membrane excitability will be appropriately modified.

      b. We have provided a detailed explanation for how STIM overexpression might rescue the phenotypes caused by Set2RNAi in Point 1. In short, these phenotypes depend upon IP3R mediated Ca2+ entry driving a transcriptional feedback loop. We relied upon past reports that STIM overexpression upregulates IP3R-mediated Ca2+ release and SOCE in Drosophila itpr mutant neurons (Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al, 2016). We therefore propose that STIM overexpression in the Set2RNAi background rescues IP3R mediated Ca2+ release followed by SOCE, which drives enhanced Set2 transcription, counteracting the effects of the RNAi. We will explain this more clearly with past references in the next revision.

      c. We have provided a detailed response to this comment in Point 12. Briefly, we agree that building luciferase reporters for Trl could be an ideal strategy to test for its responsiveness to SOCE and needs to be done in future. As an alternate strategy, we have looked at data from existing studies of interacting partners of Trl (Lomaev et al., 2017) and identified CamKII, which is both Ca2+ responsive (Braun and Schulman, 1995; Yasuda et al., 2022), and thus might activate Trl through a phosphorylation-switch like mechanism. Moreover, a previous publication identified a requirement for CamKII in THD’ neurons for Drosophila flight (Ravi et al., 2018). We are testing the ability of a dominant active version of CamKII to rescue THD’>E180A flight deficits and will include this information in the next version of the manuscript.

      References

      1. Agrawal N, Venkiteswaran G, Sadaf S, Padmanabhan N, Banerjee S, Hasan G. Inositol 1,4,5-Trisphosphate Receptor and dSTIM Function in Drosophila Insulin-Producing Neurons Regulates Systemic Intracellular Calcium Homeostasis and Flight. J Neurosci. 2010;30:1301-1313. doi:10.1523/jneurosci.3668-09.2010
      2. Braun AP, Schulman H. A non-selective cation current activated via the multifunctional Ca(2+)-calmodulin-dependent protein kinase in human epithelial cells. J Physiol. 1995. 488:37-55. doi:10.1113/jphysiol.1995.sp020944
      3. Chakraborty S, Deb BK, Chorna T, Konieczny V, Taylor CW, Hasan G. Mutant IP3 receptors attenuate store-operated Ca2+ entry by destabilizing STIM-Orai interactions in Drosophila neurons. J Cell Sci. 2016. 129:3903-3910. doi:10.1242/jcs.191585
      4. Deb BK, Pathak T, Hasan G. Store-independent modulation of Ca2+ entry through Orai by Septin 7. Nat Commun. 2016. 7:11751. doi:10.1038/ncomms11751
      5. Depetris-Chauvin A, Berni J, Aranovich EJ, Muraro NI, Beckwith EJ, Ceriani MF. Adult-specific electrical silencing of pacemaker neurons uncouples molecular clock from circadian outputs. Curr Biol. 2011. 21:1783-1793. doi: 10.1016/j.cub.2011.09.027.
      6. Keyser P, Borge-Renberg K, Hultmark D. The Drosophila NFAT homolog is involved in salt stress tolerance. Insect Biochem Mol Biol. 2007. 37:356-362. doi:10.1016/j.ibmb.2006.12.009
      7. Kilo L, Stürner T, Tavosanis G, Ziegler AB. Drosophila Dendritic Arborisation Neurons: Fantastic Actin Dynamics and Where to Find Them. Cells. 2021. 10:2777. doi:10.3390/cells10102777
      8. Lomaev D, Mikhailova A, Erokhin M, et al. The GAGA factor regulatory network: Identification of GAGA factor associated proteins. PLoS One. 2017. 12:e0173602. doi:10.1371/journal.pone.0173602
      9. Mitra R, Richhariya S, Jayakumar S, Notani D, Hasan G. IP3/Ca2+ signals regulate larval to pupal transition under nutrient stress through the H3K36 methyltransferase dSET2. Development. 2021. 148:dev199018. doi:10.1101/2020.11.25.399329
      10. Pathak T, Agrawal T, Richhariya S, Sadaf S, Hasan G. Store-Operated Calcium Entry through Orai Is Required for Transcriptional Maturation of the Flight Circuit in Drosophila. J Neurosci. 2015. 35:13784-13799. doi:10.1523/jneurosci.1680-15.2015
      11. Ravi P, Trivedi D, Hasan G. FMRFa receptor stimulated Ca2+ signals alter the activity of flight modulating central dopaminergic neurons in Drosophila melanogaster. Barsh GS, ed. PLOS Genet. 2018. 14:e1007459. doi:10.1371/journal.pgen.1007459
      12. Sachse S, Rueckert E, Keller A, Okada R, Tanaka NK, Ito K, Vosshall LB. Activity-dependent plasticity in an olfactory circuit. Neuron. 2007. 56:838-50. doi: 10.1016/j.neuron.2007.10.035.
      13. Sharma A, Hasan G. Modulation of flight and feeding behaviours requires presynaptic IP3Rs in dopaminergic neurons. Elife. 2020;9. e62297.doi:10.7554/elife.62297
      14. Venkiteswaran G, Hasan G. Intracellular Ca2+ signalling and store operated Ca2+ entry are required in Drosophila neurons for flight. Proc Natl Acad Sci. 2009.106:10326-10331. doi: 10.1073/pnas.0902982106
      15. Yasuda R, Hayashi Y, Hell JW. CaMKII: a central molecular organizer of synaptic plasticity, learning and memory. Nat Rev Neurosci. 2022. 23: 666-682 doi:10.1038/s41583-022-00624-2
      16. Zhai B, Villén J, Beausoleil SA, Mintseris J, Gygi SP. Phosphoproteome Analysis of Drosophila melanogaster Embryos. J Proteome Res. 2008. 7:1675-1682. doi:10.1021/pr700696a
    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript describes conditions under which "Self-inactivating Rabies" (SiR) can be grown to limit mutations that would allow the virus to replicate in the absence of TEV protease. It is also shown that neurons directly infected with a non-mutated virus remain healthy and that the virus does not mutate in the brain in vivo. Remarkably there is nothing in the manuscript to address the obvious question that is raised by the observation that such mutations were occurring around the time of the initial description of circuit tracing with this virus. Can the transsynaptic tracing experiments in the absence of TEV expression (as described in their original Neuron paper) be replicated with SiR that is not mutated? This obvious omission suggests that the authors might have conducted such experiments and were unable to replicate their published results. It is imperative that the authors be forthcoming about whether they have conducted such experiments and what were the results. If they have not conducted such experiments, they should do them and include the results here. Regardless of the outcome, the results should be published. If they cannot replicate their results, then the reliability of the Neuron paper is in doubt.

      How do the results presented here relate to the results published in the Neuron paper and why are they not definitive with respect to the utility of SiR? The original publication in Neuron presents results that do not appear to be plausible and are best explained by the possibility that some experiments described in that manuscript were conducted using mutated SiR. This became most apparent when shortly after the Neuron publication, the Tripodi lab shared SiR as well as TEV expressing cell lines for propagation with other labs. Several of those groups observed that when they progagated the SiR received from the Tripodi lab, there was a mutation that removed the linkage of the PEST targeting sequence to N. This would be expected to allow the virus to replicate and spread without the need for TEV protease to remove the PEST sequence - precisely the phenotype observed in the trans-synaptic tracing experiments described in the Neuron paper. In the Neuron paper, culture experiments showed that the N-PEST (SiR) rabies could not replicate in the absence of TEV. And additional experiments showed that the virus is not toxic to neurons directly infected. These are the same experiments that are replicated in this submission. But then (in the Neuron paper) comes the unlikely report that this virus can spread trans-synaptically in vivo, in the absence of TEV expression. An alternative explanation would be that the virus used for those experiments was mutated and that is why TEV expression was not needed. There are no experiments in the original Neuron paper that address this possibility. Specifically, the experiments in Neuron describing cell survival during trans-synaptic tracing are not adequate to rule this out. This is because the two timepoints during which neurons were counted correspond to an early time when labeled neurons would be expected to still be accumulating and a later time that might be past the peak and represent a time when many neurons have died. To quantify proportions of neurons that survive, it is necessary to follow the same neurons over time, as has been done to demonstrate that only about half of neurons infected with G-deleted rabies die (half survive). Until tests are conducted testing whether TEV expression is required to obtain trans-synaptic labeling with an SiR that is known to not be mutated, it is irrelevant whether mutations can be prevented under particular culture conditions. The utility of this virus depends on whether it can be used for trans-synaptic tracing without toxicity and this manuscript presents no experiments to address that. Further, the omission of such experiments is glaring, as it is difficult to imagine that they have not been attempted.

      We thank the reviewer for giving us the opportunity to improve on this point. We have performed additional experiments to confirm the ability of revertant-free SiR virus to spread transsynaptically in vivo. Our data shows that non-mutated SiR spreads transsynpatically in the mouse brain when complemented with G. In addition, we also tested the effect of the addition of TEVp to the starter neuronal population and found that it can significantly improve spreading efficiency. These data confirm the transsynaptic spreading capabilities of unmutated SiR in line with our original report. Furthermore, the data show the enhancing effect on the spreading efficacy of supplementing TEVp to the starter cells, broadly in line with what was recently reported by Jin et al., 2023. We have discussed the implications of these findings and suggested future directions in the main text and discussion.

      Additionally, for completeness, we also assessed the spread efficiency of the recently generated SiR-N2c (based on the CVS-N2c rabies strain) in presence and absence of TEVp. We found that SiR-N2c spreads significantly better in the BLA-> NAc circuit than the original SiR (based on the SAD-B19 strain), and that the same spreading efficiency is not achieved by complementing SiR-B19 with the G from CVS_N2c Rabies strain. Interestingly, we found only a very small effect of the addition TEVp to the starting cells on the number of transsynaptically labelled cells with SiR-N2c. We have discussed the implications of these findings in the main text and discussion.

      Changes in the manuscript: We have updated Figure 1 with the addition of a 6-month time point and update the main text accordingly. The updated paragraph is provided here:

      "Results, SiR transsynaptic spreading.

      We then tested the ability of revertant-free SiR to trace neural circuits transsynaptically in the mouse brain. ΔG-Rabies vectors can be pseudotyped with the chimeric EnvA glycoprotein to selectively infect neurons expressing the TVA receptor, which is not endogenously expressed by mammalian cells (Wickersham et al., 2007b). We injected the nucleus accumbens (NAc) of CRE-dependent tdTomato reporter mice with an AAV expressing either TVA and the rabies G or TVA only. After 3 weeks, we re-injected the NAc with EnvA-pseudotyped revertant-free SiR-CRE or EnvA-pseudotyped SiR-G453X-CRE and assessed the CRE-dependent tdTomato expression presynaptically, in the basolateral amygdala (BLA). At 1 month post SiR injection, we detected no tdTomato+ cells in the BLA in TVA-only-injected animals, confirming the G-dependency for SiR transsynaptic spreading (Fig 5B-C). In contrast, as expected, transsynaptic spreading was apparent in the TVA+G condition. We observed similar numbers of presynaptically traced neurons in both SiR-CRE and SiR-G453X-CRE injected brains (169 ± 24 and 190 ± 36 tdTomato+ neurons, respectively; two-tailed t-test, P = 0.64; Fig 5B-C). However, tdTomato+ microglial cells were only detected in the SiR-G453X-CRE condition indicating the re-emergence of toxicity of the revertant mutants (Fig 5B). We also tested the effect of supplying TEV protease to the starting cells, as this has been suggested to be a necessary step to ensure transsynapitc spreading. While the previous experiments unambiguously show that TEVp is not necessary for the transsynaptic spreading of SiR, the injection of an AAV expressing TEVp in the NAc did lead to an increase in the number of transsynaptically labelled BLA neurons (366 ± 69 tdTomato+ neurons; two-tailed t-test, P = 0.04; Fig 5C), indicating that TEVp-dependent SiR reactivation in starter cells can improve its spreading (Jin et al., 2023).

      We recently showed that a novel SiR-N2c vector, derived from the neurotropic CVS-N2c Rabies strain, displays enhanced transsynaptic spreading and improved peripheral neurotropism over the original SAD B19-derived SiR (Lee et al., 2023). Hence, for completeness, we compared the transynaptic spreading efficacty of EnvA-pseudotyped revertant-free SiR-N2c and the original SiR. SiR-N2c labelled a greater number of BLA neurons at 1 month p.i. than what was detected with SiR (1691 ± 112 tdTomato+ neurons traced by SiR-N2c; two-tailed t-test, P = 2x105; Fig 5D-E). Additionally, TEVp expression in the starter cells in SiR-N2c tracing experiments had a negligible effect on the overall transsynaptic spreading (1934 ± 135 tdTomato+ neurons traced by SiR-N2c in presence of TEVp; two-tailed t-test, P = 0.24; Fig 5D-E). Since the use of G from the CVS-N2c Rabies strain (G_N2c) has been shown to improve ΔG-Rabies (SAD-B19) retrograde tracing (Zhu et al., 2020), we tested if complementing EnvA-pseudotyped SiR with G_N2c in the NAc could increase its spreading. While we detected more BLA tdTomato+ neurons than in our previous experiments, complementing SiR with G_N2c still labelled less neurons than SiR-N2c, even when TEVp was provided to the starter cells (487 ± 164 and 844 ± 14 tdTomato+ neurons traced by SiR in absence or presence of TEVp, respectively; Fig 5D-E)."

      Discussion

      "ΔG-Rabies vectors are powerful tools for the dissection of neural circuit organization thanks to their ability to spread retrogradely to synpatically-connected neurons. Here, we show that EnvA-pseudotyped revertant-free SiR vectors effectively spread transsynpatically in the mouse brain. Importantly, the co-delivery of an AAV expressing TEVp in addition to G increase the number of traced neurons in presynaptic areas, likely due to the TEVp-dependent reactivation of SiR in vivo (Ciabatti et al., 2017), in line with recent results (Jin et al., 2023). This should be considered when planning transsynaptic tracing experiments using SiR. To improve SiR spreading efficiency, further studies should investigate the use of inducible TEVp, as we previously showed (Ciabatti et al., 2017), that could maximise spreading efficiency while minimising possible side effects of prolonged protease expression.

      Interestingly, we found that the recently developed SiR-N2c vector, generated by applying the same proteasome-targeting modification to the genome of the CVS-N2c ΔG-Rabies strain (Lee et al., 2023), show a higher number of retrogradely labelled neurons compared to the original SiR (SAD-B19) (Fig 5). Additionally, the co-delivery of TEVp had a smaller effect on the number of neurons transsynaptically-traced by SiR-N2c. Interestingly, the gap in trassynaptic spreading efficacy between SiR (SAD-B19) and SiR-N2c could not be filled by complementing the SiR with the neurotropic G_N2c. This could be linked to a more efficient packaging of SiR-N2c by G_N2c (Reardon et al., 2016; Sumser et al., 2022) or by the particularly high speed of CVS-N2c strain propagation (~12hrs)(Callaway, 2008; Hoshi et al., 2005). These results point to SiR-N2c as the vector of choice for transsynaptic experiments."

      Other comments:

      "A recently developed engineered version of the ΔG-Rabies, the non-toxic self-inactivating (SiR) virus, represents the first tool for open-ended genetic manipulation of neural circuits." It is not clear what the authors intend to be claiming with respect to "open-ended genetic manipulation of neural circuits" but it is clear that this assertion is overblown. There are numerous tools that are available for genetic manipulation of neural circuits. This is not the first, won't be the last, and it is arguably not the best.

      We have rephrased this sentence.

      Changes in the manuscript: The updated paragraph and figure panel is provided here:

      Abstract

      "A recently developed engineered version of the ΔG-Rabies, the non-toxic self-inactivating (SiR) virus, allows the long term genetic manipulation of neural circuits."

      "Interestingly, a fraction of tdTomato+ neurons survived in ΔG- Rab-CRE-injected brains, differing from what we observed when injecting ΔGRab-GFP, where no cells were detected at 3 weeks p.i. (Fig 3CD) (Ciabatti et al., 2017). " This is a known result (same as Chatterjee et al., 2018) with a known mechanism. GFP expression is not observed because the rabies virus transitions from transcription to replication resulting in the termination of GFP expression. But Cre-recombination of the genome permanently labels cells with TdTomato. This is how Chatterjee et al. demonstrated that half of the neurons infected with G-deleted rabies survive. They imaged cells and saw that the GFP disappeared but the cells marked by Cre-recombination and RFP expression remained healthy indefinitely. The consideration of this in the Introduction is strange. There is no reason to suppose that Cre expression would somehow protect cells from rabies infection and there is no need to propose any such mechanism to explain the observed results.

      This consideration is a response to the suggestion, proposed in Matsuyama et al 2019, that the toxicity reduction observed in ΔG-Rab-CRE could be linked to the expression of Cre recombinase compared to a cytosolic protein.

      "Here we show that revertant-free SiR-CRE efficiently traces neurons in vivo without toxicity in cortical and subcortical regions for several months p.i.."

      This wording is disingenuous and appears to be intentionally misleading. "Trace" implies that circuits were traced by transynaptic labeling, which they were not.

      To avoid any misunderstanding, we have now changed trace to infect.

      Changes in the manuscript: The updated sentence is provided here:

      Abstract

      "Here we show that revertant-free SiR-CRE efficiently infect neurons in vivo without toxicity in cortical and subcortical regions for several months p.i.."

      Reviewer #2 (Public Review):

      The study by Ciabatti et al examined the mutation issue for self-inactivating rabies (SiR), which was found by other labs. The authors identified the mutations in the rabies genome and showed that this mutation occurred more frequently after multiple passage of production cell lines with suboptimal TEVp expressions. The authors further showed that such mutation did not accumulate in vivo and that SiR-labeled cells remained alive across longitudinal imaging in vivo.

      In this study, the rabies genome is rigorously examined by sequencing many viral particles from independent preparations. The rabies with point mutation in the PEST domain is directly engineered for sequencing and infection test. Overall, the mutation issue is well addressed by the authors and the conclusions are well supported, but some more aspects of discussion and data analysis need to be extended for an easier production of SiR in a condition not that optimal.

      1) The authors stated that one should produce SiR from cDNA in order to avoid the potential mutation in SiR. From a practical point of view, it would be much better to amplify the rabies from a stock virus directly in the production cell lines. Any discussion or exploration on this direction would be appreciated in the field.

      We thank the reviewer for giving us the opportunity to improve on this point. We have added in the discussion a paragraph suggesting the number of passages to be used during production for the packaging cells and viral stocks, referring to the equivalent passage in our experiments.

      Changes in the manuscript: The updated paragraph is provided here:

      Discussion

      "Notably, we found that TEVp activity inevitably decreases after several passages of amplification of HEK-TTG, thus fresh low passage packaging cells should always be used to produce SiR preparations. Our results suggest that stock for packaging cells should be made within a couple of passage after selection is established, and then used freshly defrosted to produce SiR viruses (equivalent to P0 cells in Fig 2B-C). Similarly, SiR supernatant stocks should be made directly from cDNA transfection and amplified for a maximum of 2 passages (equivalent to SiR P0 in Fig 2E) before being used for large scale SiR productions."

      2) 6 passages of production cell lines are not that extensive. In Fig.2C, there was already some level of TEVp activity reduction at 2nd passage. It is not clear to me that how the TEVp activity reduction naturally happens. Is there some room to play around puromycin concentration to maintain high TEVp activity?

      As mentioned in the previous point, we have added in the discussion a paragraph describing the recommended number of passages to be used during production of the packaging cells and viral stocks, referring to the equivalent passage in our experiments. We clarified that our starting P0 conditions for packaging cells and stock SiR viruses were equivalent to already amplified stocks ready for viral production, which would add only 1-2 passages.

      Reviewer #3 (Public Review):

      This paper is a response to the report by Lin et al., bioRxiv 2022 (DOI: https://doi.org/10.1101/550640) that mutations in the genome of SiR were identified, which could result in a canonical G-deleted Rabies virus.

      Strengths:

      First, the authors found that SiR production from cDNA leads to revertant-free viruses by analyzing a total of 400 individual viral particles obtained from 8 independent viral productions with Sanger sequencing. Next, they identified the molecular mechanisms of mutations in the SiR; they found that extensive amplification of packaging cells HEK-TGG leads to the selection of clones with suboptimal TEVp expression level, which leads to the accumulation of revertant mutants, where, as the authors discuss, the revertant mutants have a specific replication advantage. Based on these observations, the authors recommend producing SiR freshly from cDNA with low passage packaging cells. Lastly, the authors observed that SiR-infected hippocampal and cortical neurons can survive for longer periods of time than the neurons infected with revertant mutants or a canonical G-deleted Rabies virus by combining next-generation sequencing of RNAs isolated from infected tissue and 2-photon in vivo longitudinal imaging of infected cortical neurons. Together, these findings support the idea that the degradation of N by PEST-mediated cellular mechanism results in the self-inactivation of SiR as suggested in the original SiR manuscript (Ciabatti et al., Cell 2017). Thus, SiR remains a powerful viral tool for the chronic investigation of neuronal circuitry and function as long as the virus is prepared in a way the authors recommend.

      Weaknesses:

      While most of the findings are solid, some conclusions are not fully supported by the data presented. The authors need to address the following points: Reviewer #3

      1) In Figure 3B-D, the authors concluded that SiR-CRE -infected cells did not show cell death in contrast to Rab-CRE and SiR-G453X, but it cannot be fully supported only by this experiment. The authors should consider the potential variance in infection efficiency in each experimental animal and show evidence of suppressed cell death. In addition, it needs to be confirmed that SiR-Cre is diminished in infected cells at later times. The authors should explain and address these concerns by conducting additional experiments, for example, cleaved caspase-3 staining and quantification of virus RNA levels in each time point as performed in their previous study Ciabatti et al., Cell 2017 (DOI: 10.1016/j.cell.2017.06.014).

      We thank the reviewer for the suggestion and give the opportunity to strengthen our work. We have added an analysis of the rabies transcripts over time in SiR-infected hippocampi (Fig S4). The drastic decrease of SiR RNA, along with the finding that the numbers of tdTomato-positive cells remain comparable at each time points support the reduction in mortality in SiR infected cells. We have added this data and clarified this point in the text..

      Changes in the manuscript: The updated paragraph is provided here:

      Results: Difference in cytotoxicity between ΔG-Rabies, PEST-mutant SiR and SiR

      "We detected no decrease of tdTomato+ neurons in SiR-infected hippocampi (4109 ± 266 tdTomato+ neurons at 1 week p.i.; 4458 ± 739 tdTomato+ neurons at 2 months p.i.; one-way ANOVA, F = 0.08, p = 0.92, Fig 3C-D) while only 44% of tdTomato+ neurons were detected in Rabies-targeted and 60% in SiR-G453X-targeted hippocampi at 2 months p.i. (1422 ± 184 at 1 week versus 624 ± 114 at 2 months p.i. for ΔGRab; one-way ANOVA, F = 11.55, p = 0.003; 3052+508 at 1 week versus 1829+198 at 2 months p.i. for SiR-G453X; one-way ANOVA, F = 4.27, p = 0.05; Fig 3C-D). Additionally, we confirmed inactivation of revertant-free SiR by analysing the decrease of Rabies transcripts in the infected hippocampi over times (Fig S4). These results support the lack of toxicity of SiR on the infected neurons, in line with our previous findings (Ciabatti et al., 2017). Moreover, these data confirm the requirement for an intact PEST sequence to sustain the self-inactivating behaviour of SiR and suggest that PEST-targeting mutations do not occur in vivo."

      2) In Figure 3E-F, to ensure the long-term stability of SiR-Cre in the vivo mouse brain, authors conducted SMRT sequencing 1 week after the virus infection. To test the potential slow accumulation of mutations at 1-month and 2-month, the authors should perform the same experiment at these time points. Only when SiR-Cre was undetected at 1-month and 2-month, would it be reasonable to show only 1-week data, however, such data is not presented.

      We thank the reviewer for the suggestion. We have added an analysis of the Rabies transcript in the infected Hippocampi showing a drastic decrease of SiR RNA over time. This result, along with the finding that similar numbers of tdTomato-positive cells are detected in the infected hippocampi over time, support our choice of an early time point to find emergence and accumulation of revertant mutations.

      3) In figure 4, the authors used only 2 mice for this experiment, although this is one of the most important experiments to ensure SiR-infected cells stay alive for the long term in vivo animals. It should be confirmed whether the conclusion remains the same by increasing the number of animals.

      While we understand why the reviewer put forward this suggestion, we believe that our choice of number of animals is appropriate as the investment in time and resources to adding further animals would not strengthen our conclusion (which we have indirectly assessed previously (Ciabatti et al 2017) and here in Fig 3). For completeness, we have added a Fig4_S1with the images of all the ROI at every time points used in Fig 4.

      4) The legend in Table 3 doesn't match the contents.

      We thank the reviewer for pointing this out, in response we have now updated Table 3.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for reviewing our manuscript " Energy Coupling and Stoichiometry of Zn2+/H+ Antiport by the Cation Diffusion Facilitator YiiP". After carefully considering the reviewer's comments, we have made substantial changes to the manuscript, which we believe is now much improved. In addition to clarifying various points raised by the reviewers, we have also added a variety of new data from both experimental and computational studies. We hope that these changes will satisfy the reviewers such that we can move forward towards finalizing the publication process.

      New data added to this revision includes

      • SEC profiles comparing D287A and D287/H263A before and after complex formation with Fab to illustrate formation of higher order oligomerization (Suppl. Fig. 6).

      • Control trace from MST using Mg2+ to illustrate reproducibility (Suppl. Fig. 6).

      • Results from MD simulation of D72A mutant to explore the Asp72-Arg210 salt bridge as a stabilizing element (Fig. 4)

      • Analysis of cavities in WT and D70A_asym structures to illustrate occlusion of site A (Suppl. Fig. 13).

      • In addition, we have redone MD simulations for YiiP with site B empty. These simulations were originally done (3 x 1 μs) with a modified version of the zinc dummy model and we have redone them (6x1 μs) using our previously published zinc dummy model to be strictly consistent with other simulations on holo, apo and D72A structures. The new results are qualitatively consistent with the previous simulations and our conclusions remain unchanged.

      In addition, the text has been modified and several figures have been updated to address concerns of the reviewers as described below.

      Although figures will ultimately be renumbered to conform with eLife formatting, they have retained their original numbers for this revision to prevent confusion, except that Suppl. Fig. 13 is a new figure added at the request of reviewer 2.

      Reviewer #1 (Recommendations For The Authors):

      I have only a few comments that might need clarification from the authors:

      • If the unbinding of Zn2+ to site B triggers the occlusion (and maybe the OF state) and the external pH does not affect that binding, how is it prevented from being always bound to Zn2+ and thus occluded also while it should be transporting protons (B to C panels in Figure 5)? Are there some other factors that I am missing?

      Our data shows that the affinity of site B is low (micromolar), especially relative to the concentration of free Zn within the cytosol (picomolar - nanomolar). Therefore, we would expect that site B is normally empty and that the resting state would be represented by panel D in Figure 5. An elevation of Zn concentration, or delivery of Zn to the transporter by some as yet uncharacterized binding protein, would initiate the cycle starting with panel E.

      It is notable that the TM2/TM3 loop adopts a novel conformation in the occluded state, in which it extends to interact with the CTD (panel G in Figure 5). In this conformation, the Zn binding site is disrupted, thus preventing binding of additional Zn ions to the TM2/TM3 loop. Although we do not know how this loop behaves as the protein transitions to the outward-facing state (panels A & B), it is tempting to speculate that it retains the extended conformation until the protein returns to the inward-facing, resting conformation in panel D. This idea has been added to the revised manuscript (line 464).

      In addition, we have added a sentence (line 507) to explicitly state our assumption that Zn only binds to site B in the IF state.

      • I am not an expert on experiments, but the results for mutants that abolish site C are difficult to understand. For D287A/H263A, the SEC columns data suggest a population of higher oligomers. Still, for the D70A/D287A/H263A and D51A/D287A/H263A, they showed a native dimer. I understand your suggestion that the Fab induces the domain swap, but how do you explain the double mutant SEC column result? Please elaborate.

      The unexpected behavior of site C mutants certainly introduces complexity into our study. Considering all the ins and outs of our analyses, we are confident that site C is a high-affinity site that is constitutively occupied and serves as a structural site to stabilize the architecture of the native homodimer. In the original submission, we included SEC profiles for D287A and D287A/H263A in Suppl. Fig. 4 as well as profiles for D70A/D287A/H263A and D51A/D287A/H263A in Suppl. Fig. 6. The former in Suppl. Fig. 4 characterize the complex between mutant YiiP and Fab (for cryo-EM), whereas the latter in Suppl. Fig. 6 represent YiiP in the absence of Fab (for MST). In the absence of Fab, the mutations do not alter the elution volume at ~12 ml, consistent with the conclusion that the native YiiP homodimer remains unperturbed. In the presence of Fab, mutations affect the SEC profile in two ways: a shift in the main peak to ~11 ml, and appearance of a subsidiary peak at ~10 ml. The shift of the main peak can be explained by formation of a complex between YiiP and Fab. Presence of the subsidiary peak - seen for D70A, D287A, and D287A/H263A mutants - can be explained by formation of a dimer of dimers (4 YiiP + 4 Fab), which could be isolated as a subpopulation of particles during the processing of cryo-EM images. For D70A and D287A, the individual dimers were unperturbed in this dimer-of-dimers. In fact, we used masking and signal subtraction to isolate the individual dimers and included them in the final reconstruction together with the more prevalent dimeric species (2 YiiP + 2 Fab).

      The D287A/H263A-Fab complex behaved differently. The main peak of the SEC profile was shifted to 10 ml, indicating that a dimer of dimers was the prevalent complex; absence of a peak at 11 ml indicated that isolated dimeric complexes were no longer present in the solution. Furthermore, the subsidiary peak was at ~9 ml, indicating an even larger complex not seen in the other preparations. The appearance of particles in cryo-EM images were distinct from the other mutants (e.g., compare 2D classes shown in panels C and D in Suppl. Fig. 4). 3-D structures revealed dimer-of-dimers with the domain swap as well as larger linear oligomers. Although not well resolved due to preferred orientation, it appears that these linear oligomers consist of a propagated domain swap.

      We have included some new data to bolster our conclusion that, although the D287A/H263A mutant destabilized site C, Fab binding was responsible for inducing the domain swap. The new data, presented in Suppl. Fig. 6, shows an SEC profile for a preparation of D287A/H263A both before and after formation of the complex with Fab. In addition to including this new data, we have amplified our description of these SEC profiles under the heading "Zn2+ binding affinity" in the paragraph starting on line 289 to try to clarify this complex issue for the reader.

      • Since in the D287A mutant, you are disrupting the preferred tetrahedral coordination of Zn2+, but it still binds, do you observe any waters that compensate for the missing aspartate? Maybe in the MD simulations?

      Unfortunately, the resolution of the cryo-EM maps are not high enough to resolve water molecules that we assume are present at sites B and C. For the MD simulations, we did not use mutants, but simply removed Zn from each of the sites. So we are unfortunately not able to answer this question with the available data.

      Reviewer #2 (Recommendations for The Authors):

      1) It is no doubt that cryo-EM structures of four types of zinc-binding site mutants of a bacterial Zn2+/H+ antiporter YiiP provide important insight into distinct structural/functional roles of each of the binding sites. However, overall resolution of the cryo-EM maps presented in this paper is not high enough to address the Zn2+ coordination structures, the kinked TM5 segment seen in a D51A mutant, and the extended conformation of TM2/TM3 loop seen in the D70A asymmetric dimer. It would be better to highlight the density of the above regions and discuss the vitality of their structure models. Similarly, the presence of additional water molecules at sites B and C (line 117) do not seem convincing.

      We are completely sympathetic with the recommendation of illustrating the map quality as thoroughly as possible. We hope that interested readers will download map and model from the respective PDB and EMDB repositories and see for themselves. Nevertheless, we have provided several new figure panels to illustrate explicitly the densities associated with the kinked TM5 segment in the D51A mutant (Suppl. Fig. 2) and the extended TM2/TM3 loop in the D70A mutant (Suppl. Fig. 5) and have referred to them at appropriate places in the text (line 128 and line 151). In Suppl. Fig. 5, we also included figure panels to show densities for this loop in WT and D287A/H263A mutants.

      It is true that the maps are generally of insufficient resolution to clearly define the coordination of Zn. The relevant densities are shown for all sites in all mutants in Suppl. Fig. 2. Despite this shortcoming, the coordination geometry is well established by the previous, higher resolution X-ray crystal structure as well as by MD simulations. Each site is shown in the insets of Fig. 1b, c and d. The new cryo-EM densities and resulting models are consistent with this coordination, which we have now pointed out in the legend to Fig. 1. The important point is that the new cryo-EM maps document the occupancy of ions at the individual sites as well as the large scale conformational changes associated with this occupancy, which was the main goal of the study.

      Finally, we agree that the presence of additional water molecules at the sites is not well supported; because this issue has little bearing on our analysis, these comments have been removed.

      2) Identification of the occluded state in D70A asymmetric dimer is exciting, hence this reviewer recommends the authors to highlight the structure of this state more effectively in comparison with the IF/OF states. It would be better to show the side views of the superimposition between the occluded and IF/OF states, and the pore profile and radius in the TM domain of these three states. The authors should also show the density map of site A (including M2 and M5) in the occluded protomer of the asymmetric dimer in Suppl. Fig. 2. Additionally, the authors should include information regarding the cytosolic or periplasmic view in the legend of Figure 3A, B, D, F, G, and H.

      As suggested, we have prepared a new supplemental figure juxtaposing the IF and occluded states and depicting differences in pore radius and accessibility of site A (Suppl. Fig. 13, initially referred to on line 152 and various other locations in the manuscript with methods described on line 680). However, we unfortunately do not have a structure in the OF state to complete this comparison.

      The density map for site A including M2 and M5 of the occluded protomer is shown in Suppl. Fig. 2 in which density thresholds have been adjusted to show the helices.

      We have updated the figure legend for Figure 2 (referred to as Figure 3 by the reviewer) with the orientation of view, which are all from the cytoplasm looking toward the membrane.

      3) MST analyses using the YiiP mutants with a single Zn2+-binding site at different pH are useful, and the data interpretation in combination with computational approaches of CpHMD and MST inference are nice challenges, indeed. However, it may, in a sense, appear that the MD simulations have been carried out intentionally and/or forcibly so that the outcomes are compatible with the experimental MST data. Although this is not unusual or unacceptable, this reviewer is concerned that the determined pKa values of some residues, especially Asp residues at Site A, are unusually high. The validity of this outcome should be discussed from physicochemical viewpoint; what factors raise the pKa of Asp51 and Asp159 so high. In this context, the MST inference titration curve seems unusually steep for D159 (and H155), of which validity needs to be discussed. This reviewer is also concerned about the large variations per measurement in the MST experiments (Suppl. Fig 6 E, F, and G). Are such large variations common to this experiment? Optimization of the measurement conditions such as protein concentration, and/or increase of AlexaFluor-488 labeling efficiency might greatly improve the reproducibility per measurement. The authors should include information on which residue(s) is labeled with AlexaFluor 488 in YiiP (line 641).

      One of the outcomes of our so-called MST-inference algorithm was the conclusion that protonation states for H155 and D159 were coupled. The basis for this conclusion is described in some detail in the Methods section (paragraph starting on line 1025) and results in cooperativity in the protonation state of these two residues. This cooperativity explains the unusually steep binding curve in Suppl. Fig. 10e. We added a couple of sentences to explain this result in the Results under "Zn2+ binding affinity", line 352.

      There is indeed precedent for increased pKas of acidic residues based on experimental measurements for Glu and inferred for Asp, both in membrane proteins. Computational approaches similar to the ones we use (including some of our own earlier work) have also pointed to elevated pKas by 1-3 units for Asp residues. We included a paragraph in the Discussion of Stoichiometry and energy coupling (line 537) citing these references and explaining that such pKa shifts reflect strong Coulomb interactions of titratable residues in close proximity in the low dielectric environment of the membrane.

      We believe there is a misunderstanding about our presentation of raw data for the MST experiments in Suppl. Fig. 6. Panels E, F and G show an overlay of data from the entire Zn titration, which is therefore expected to change according to the Zn concentration in each capillary. We have revised the corresponding legend to clarify the plots. We have also included traces from a Mg2+ titration as a negative control that better illustrates the reproducibility of these measurements.

      The AlexaFluor dye contained the reactive NHS group which preferentially targets the N-terminus of the polypeptide chain. Although labeling of lysine side chains is possible, we do not expect much given the low labeling stoichiometry of ~1:1 used for our experiments. We updated the Methods section under MST experiments (line 689) with this information.

      Reviewer #3 (Recommendations For The Authors):

      By measuring the binding affinity of site A using the D70A mutant that retains site C at pH 5.6 is should be possible to verify if the affinity reported in Table 2 is affected by the quaternary structure of the system. The 40-fold difference in affinity between site A and site C at pH 5.6 should be sufficiently large to permit a meaningful measurement.

      To address this suggestion, we have included additional data in Table 2 from the D70A/D287A mutant. Based on the cryo-EM structure of D287A, we expect that site C is still intact, which is why it was omitted from the original manuscript. However, the affinities measured at pH 6 and 7 are very consistent with those from the triple mutant (D70A/D287A/H263A), supporting the idea that complete abolishment of site C does not affect measurement of affinity at sites A or B. This additional data is presented in the section on "Zn2+ binding affinity" on line 304. We also note that the SEC profiles in the absence of Fab are consistent with formation of the native homodimer for all the mutants, as described in our response to reviewer 1 and now shown in Suppl. Fig. 6.

      More details should be provided on the force field used for zinc(II) ions in MD simulations. Currently, there is only a reference to another article, where this info is in the caption of a supplementary figure.

      We added a summary of our previous work to develop a non-bonded dummy model for Zn(II) on line 727 in the Methods section entitled "Overview of the MD simulations. However, we would like to point out that all details on the parameter development and the parameters themselves are stated in the Methods section “Classical force field model for Zn(II) ions” in our previous paper [Lopez-Redondo et al, J Gen Physiol 143 (2021)] and parameter files are available as package 2934 in the Ligandbook repository https://ligandbook.org/package/2934 .

      We also realized that in the originally submitted version of this manuscript we reported “empty site B” simulations with an updated and experimental non-bonded Zn(II) dummy model that has close to experimental first-solvation shell water residence times but slightly worse solvation free energy. Although that does not really matter for these simulations because there was no Zn2+ ion in site B, we nevertheless performed a new set of 6 x 1 µs simulations with our published (J Gen Physiol 2021) Zn(II) model to make all simulations fully consistent with each other. The results remained qualitatively the same, with a lack of zinc ions in site B leading to increased flexibility in the TM2/3 loop and ultimately destabilization of the TMD-CTD interaction.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We sincerely appreciate the opportunity to revise the manuscript and the reviewers' critical comments and valuable suggestions. After carefully revising the manuscript, we strongly believe that the reviewers' comments are invaluable and will significantly enhance the quality of the manuscript and contribute to our future research. Following the reviewers' comments, we conducted a comprehensive and meticulous review, addressing each point individually and making extensive modifications and corrections. The responses to each question are provided in a point-by-point manner as follows:

      Reviewer 1:

      This study delves into the impact of imidacloprid, an insecticide documented for its toxicity towards honeybees, on the development of bee larvae. The investigation involved exposing bee larvae to various concentrations of imidacloprid, and observing the resultant effects.

      The findings of this study revealed that imidacloprid exerted a dose-dependent delay in the development of bee larvae, marked by reductions in body mass, width, and an overall decline in the growth index. Moreover, at elevated concentrations, imidacloprid was observed to impair neural transmission, induce oxidative stress, inflict damage to the gut, and inhibit hormones and genes essential for development. The larvae were found to engage antioxidant defense systems and deploy detoxification mechanisms to mitigate these effects.

      However, the manuscript could be significantly enhanced through several improvements. Firstly, the structure of the manuscript warrants refinement to foster coherence and clarity. Additionally, there is a need for careful reevaluation of the concentrations of imidacloprid employed in the study, to ensure their relevance and applicability. In terms of references, greater attention to accuracy in citation is imperative.

      Furthermore, while the authors have provided an overview of the general effects of imidacloprid on both vertebrates and invertebrates, the inclusion of a more exhaustive literature review with a specific focus on honey bees and other insects would bolster the context and significance of this research. This would be particularly beneficial in the introduction section, which should be subjected to a major revision.

      In summary, this study offers preliminary evidence of the detrimental effects of imidacloprid on the development of bee larvae by interfering with molting and metabolism. This research holds potential as a valuable resource for assessing the risks posed by pesticides to juvenile stages of various animal species.

      On behalf of all the authors, I express our most sincere gratitude for your critical comments and suggestions. Following your suggestions, we have thoroughly reviewed and revised the entire manuscript, including the issues of imidacloprid concentration and citation accuracy you raised. More importantly, we have significantly revised the structure and content of the introductory section of the manuscript to include many more detailed reviews of critical literature, with particular attention to the overview of relevant research on honey bees, Drosophila, and other insects, to promote coherence and clarity of the introduction and to enhance the context and importance of this research. We hope that these changes meet with your approval. Overall, your valuable comments have greatly improved the quality of the manuscript and will facilitate our future research.

      Q1: Line 48, "Adults exposed to high doses of imidacloprid experience", please provide a more precise value for the high doses.

      Thank you very much for your comments. Following your suggestion, we have provided precise values for high doses of imidacloprid for adult exposure based on the study by Dr. Wu et al. 2001.

      Q2: Line 82, There are several larvae effect reports using next generation sequencing approach. The authors should include those related references in this section.

      Thank you for your comments. We have included relevant references in our revised manuscript.

      Q3: Line 394, for the concentration design, the maximum concentration of imidacloprid used in this study is 377 ppb, which is from the imidacloprid residue level in beeswax. Bees don't consume beeswax, and the reference is wrong.

      Thank you for raising this critical issue. As you point out, bees do not eat beeswax, but it is important to stress that this may well mean that the bee larvae themselves are exposed to higher doses. Therefore, in this study, we ultimately designed for the worst-case scenario of 377 ppb of imidacloprid residues in beeswax. We would like your agreement on this point. In addition, we have corrected the citation errors in the references here and included them in the revised manuscript.

      Reviewer 2:

      This study provides evidence on the ability of sublethal imidacloprid doses to affect growth and development of honeybee larva. While checking the effect of doses that do not impact survival or food intake, the authors found changes in the expression of genes related to energy metabolism, antioxidant response, and P450 metabolism. The authors also identified cell death in the alimentary canal, and disturbances in levels of ROS markers, molting hormones, weight and growth ratio. The study strengths come from applying these different approaches to investigate the impacts of imidacloprid exposure. The study weaknesses are not providing an in-depth investigation of the mechanisms behind the impacts observed and not bringing the results in light of the current literature. For instance, the authors' hypothesis is based on two main points, the generation of ROS that leads to gut cell death and energy dysfunction, and the increased P450 expression. They propose this increases P450 expression which in turn increases energy consumption and could contribute to developmental retardation. There is however no investigation on the mechanisms of ROS generation (it could be through mitochondrial damage, Nox/ Duox activity, NOS activity, P450s activity, etc). A link between higher P450 expression and increased energy consumption leading to energy deprivation is also missing. It would also be important for the authors to provide a more complete literature review as previous works have investigated imidacloprid sublethal dose impacts in larval stages for bees and other insect models.

      I greatly appreciate your insightful comments and valuable suggestions on behalf of all the authors. Thank you for identifying the limitations of this study and providing valuable comments and suggestions. These comments and suggestions have significantly improved the quality of the paper and will facilitate our future research. Following your comments, we have revised and corrected the manuscript point by point. We hope that these corrections meet with your approval.

      Q1: Abstract: It would be important to rephrase the abstract to make it clear when authors are talking about gene expression results or functional assays.

      Thank you for your comment. Following your suggestion, we have revised the abstract to make it clearer, especially the description of the gene expression results. Please see lines 15-34 in our revised manuscript.

      “Abstract Imidacloprid is a global health threat that severely poisons the economically and ecologically important honeybee pollinator, Apis mellifera. However, its effects on developing bee larvae remain largely unexplored. Our pilot study showed that imidacloprid causes developmental delay in bee larvae, but the underlying toxicological mechanisms remain incompletely understood. In this study, we exposed bee larvae to imidacloprid at environmentally relevant concentrations of 0.7, 1.2, 3.1, and 377 ppb. There was a marked dose-dependent delay in larval development, characterized by reductions in body mass, width, and growth index. However, imidacloprid did not affect larval survival and food consumption. The primary toxicological effects induced by elevated concentrations of imidacloprid (377 ppb) included inhibition of neural transmission gene expression, induction of oxidative stress, gut structural damage, and apoptosis, inhibition of developmental regulatory hormones and genes, suppression of gene expression levels involved in proteolysis, amino acid transport, protein synthesis, carbohydrate catabolism, oxidative phosphorylation, and glycolysis energy production. In addition, we found that the larvae may use antioxidant defenses and P450 detoxification mechanisms to mitigate the effects of imidacloprid. Ultimately, this study provides the first evidence that environmentally exposed imidacloprid can affect the growth and development of bee larvae by disrupting molting regulation and limiting the metabolism and utilization of dietary nutrients and energy. These findings have broader implications for studies assessing pesticide hazards in other juvenile animals”

      Q2: Line 55-58: rephrase the sentences to make it clear that imidacloprid was not created in 1925, but only in the 90's.

      Thank you for pointing out this error. We have corrected the citation. Please see the line 58 in our revised version.

      Q3: Line 88: typo: " remain to be systematically investigated"

      Thank you for pointing out this error. We have rewritten the sentence. Please see lines 121-122 in our revised manuscript.

      Q4: Introduction is lacking important citations, a few of the important ones are: Farooqui 2013 (doi: 10.1016/j.neuint.2012.09.020.) - hypothesis linking neonic exposure, nAChRs receptors, and ROS in honeybees; Ihara et al 2020 (https://doi.org/10.1073/pnas.2003667117) - the targets of imidacloprid in honeybees; Martelli et al 2020 (https://doi.org/10.1073/pnas.2011828117) - mechanistic investigation of imidacloprid sublethal damage in Drosophila; Whitehorn et al 2018 (doi: 10.7717/peerj.4772) - investigation of imidacloprid sublethal dose impact on growth and development of butterflies; Chen et al 2021 (doi: 10.3390/ijms222111835) - sublethal effects of imidacloprid exposure on gene expression in honeybees at different life stages. It is important that the authors perform a more complete literature search to compare their work to previous ones, drawing conclusions and highlighting their novelties.

      We greatly appreciate your insightful comments and valuable suggestions. Following your suggestions, we have made significant revisions to the structure and content of the Introduction section. We have incorporated the critical literature you provided and other relevant literature reviews, with a particular emphasis on studies of bees, fruit flies, and other insects. These revisions aim to improve the coherence, clarity, background, and significance of the Introduction. We hope that these modifications meet with your approval. Please see the red text in the Introduction section in our revised version.

      Q5: Line 104: Explanation on the doses used should be included here, not later in the methods. Also, important to highlight that whereas the doses tested were found in bee products, they likely mean that the bees themselves were being exposed to even higher doses.

      Thank you for your comment. Following your suggestions, we have moved the explanation of the imidacloprid doses used in this study to the Results section, as you mentioned. Please see lines 138-142 in our revised manuscript.

      Q6: Line 112: It is important to identify the neuronal targets of imidacloprid in honeybees. Many are known. Some of the nAChRs targets were not investigated in this study (such as subunit alpha8 and beta1). Plus, is alpha2 an imidacloprid target? How does the expression of other nAChRs subunits compares? Importantly, these genes are expressed mostly in the nervous system, so a more correct approach would be a tissue specific analysis. The lack of tissue specific analysis is a consistent flaw throughout the methodological design.

      Thank you very much for your important comment. Bees have more than ten nAChR subunit members. Imidacloprid inhibits acetylcholinesterase activity by competitively binding to acetylcholinesterase receptors. As you noted, this study did not investigate the expression of all nAChR subunits, including the alpha8 and beta1 subunits, in different tissues, which is a shortcoming of our study. We have always failed to make a technological breakthrough and cannot dissect to obtain important tissues from developing larvae alone. We have therefore had to abandon this design and use the whole larva as a sample for measurement. We are aware that this is a shortcoming of this research. In the future, we will make a breakthrough in technology and conduct a comparative analysis of all nAChR subunit genes in different organizations and developmental stages to obtain more comprehensive and accurate data. Thank you again for raising this important issue and for your valuable suggestions.

      Q7 ~ Q9: Line 125: P450s expression may have opposite behavior when exposed to insecticides depending on tissue (such as brain and fat body). When checking whole larva gene expression, the tissue specific profiles become diluted and thus less reliable (for reference, check: https://doi.org/10.1073/pnas.2011828117); Line 131: Again, for the analysis of oxidative stress it would be important to investigate a tissue specific expression pattern and measurement of ROS markers. Investigating different time points during the exposure also adds to the mechanistic understanding. Do all tissues respond in the same way? In which tissue does an increase in ROS generation start? How? Does it spread to other tissues? By which mechanisms is it generated; Results in general: Tissue specific analyses and more time points can provide a better understanding of how sublethal imidacloprid doses impact growth and survival. Thinking about the doses of choice in light of what bees might be exposed is also important. The mechanistic understanding is missing in the paper, and without it the study does not add much in comparison to previous ones.

      Thank you very much for your valuable comments. As you pointed out, the intensity of P450 detoxification and oxidative stress varies considerably between tissues. When checking whole larva gene expression, the tissue-specific profiles become diluted, which is detrimental to elucidating mechanisms. In this study, we encountered technical barriers in obtaining independent samples of specific tissues for anatomical sampling. As a result, we had to forego analysis of some specific tissues, including the tissue-differentiated analyses of P450 gene expression patterns and ROS markers that you mentioned. We only examined larval overall detoxification and antioxidant responses to imidacloprid toxicity. While we do not believe that data from specific tissues are fully representative of the complex overall picture of larvae, there is no doubt that the decision to study larvae as a whole does not contribute to our complete understanding of the mechanisms by which imidacloprid causes larval developmental retardation and larval responses to imidacloprid toxicity. In addition, the fact that this study only analyzed one-time points during imidacloprid exposure and did not design and comparatively analyze different time points limits our complete understanding of the above mechanisms. In summary, as you have pointed out, tissue-specific analyses and more time points could better understand how sublethal doses of imidacloprid affect growth and survival. In future studies, we will overcome the technical challenges and refer to your suggestions for further systematic and in-depth mechanistic studies specifically targeting imidacloprid toxicity in different tissues at different exposure times and incorporate your suggestions, such as whether the response is consistent across all tissues, the origin of the increase in ROS production, how it increases, whether it spreads to other tissues, and the underlying mechanisms into the next experimental design. Again, Thank you for your constructive and valuable comments, which have provided valuable insight for our study on mechanisms. Undoubtedly, these comments will enhance the innovativeness of our study and greatly facilitate our future research.

      Q10: Line 236: The conclusion that mitochondrial dysfunction is taking place is not well corroborated. Are there changes in mitochondrial aconitase activity to suggest the mitochondrial origin of ROS? How do mitochondria look like under electron microscopy? Evidence for mitochondrial damage from functional assays? Could the ATP reduced levels be caused by increased consumption by other systems, instead of reduced production? Without functional assays to demonstrate mitochondrial dysfunction the indirect measurements of gene expression at most suggest expression perturbations in mitochondria for the point in time when gene profiles were examined.

      Thank you for the comments. Based on the data of the present study, i.e., suppression of mitochondrial oxidative phosphorylation (COX17, NDUFB7) and expression of genes of its alternative glycolytic pathways (Gapdh, Oscillin), as well as a decrease in the ATP content, suggests that imidacloprid exposure leads to impaired energy metabolism in larvae and not to mitochondrial dysfunction. We have corrected this uncritical language presentation error. Please see the lines 267 and 275 red text in our revised version. We hope that this correction will meet with your approval.

      Q11: Though not the aim of the study, an important step forward would be to investigate whether these doses that do not impact survival but cause growth retardation could affect the many stereotypical behaviors displayed by the worker bees when they reach the adult life. Without this sort of analysis, it is difficult to stablish whether the doses tested will impact the colony health.

      Thank you very much for your valuable suggestions, which give us broader ideas for our subsequent, more in-depth work on the mechanism of toxicity. Inspired by your suggestion, we plan to conduct further studies to investigate the effects of different levels of imidacloprid exposure on the developmental process of bee larvae and the underlying mechanism of toxicity. We will also investigate the intrinsic link between this juvenile toxicity and behavioral and physiological defects in adult individuals.

      Q12: Line 376: the authors do not provide a link to their hypothesis that increased P450, and antioxidant response is reducing larvae nutrient supply.

      Thank you for your comment. I apologize for not fully understanding your point. If you mean that the hypothesis proposed in this study that increased P450 and antioxidant responses reduce larval nutrient energy supply is not well-founded, we have already addressed this in the previous paragraph. See Figure 7 and lines 395-399 for more details in our revised manuscript.

      Q13: Line 393: Were the colonies single-cohort? Were the frames from different hives mixed together to create the experimental groups? Or each experimental group comes from a different frame/colony? This information is important to establish how much genetic variation might exist between the different experimental groups.

      Thank you for your comment. In this study, the selected colonies were healthy and not exposed to pathogens or pesticides. Two-day-old larvae from the same frames of the same hive were individually transferred to sterile 24-well cell culture plates. The plates contained a standard diet containing royal jelly, glucose, fructose, water, and yeast extract. We have included the above text in our revised manuscript. Please see the lines 430-432 red text in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Please find enclosed our revised manuscript entitled “An unconventional gatekeeper mutation sensitizes inositol hexakisphosphate kinases to an allosteric inhibitor”. We would like to thank the editorial team and the reviewers for carefully reading the manuscript and for raising a number of valuable points. We have included additional data and discussion to address the questions raised. Please find the point-by-point responses below.

      Reviewer #1:

      1) While I understand that FMP-201300 is a tool (proof-of-concept) compound it would be useful to know if it has activity against IP6K1 (or IP6K2) in cells.

      We were of course curious about this as well. Unfortunately, our attempts to generate cell lines in which IP6K1 or IP6K2 carry the gatekeeper mutation using CRISPR/Cas editing have not been successful so far. Nevertheless, to obtain information on the permeability and cellular activity of FMP-201300, we decided to treat wt cells, since the compound also inhibited IP6K1-wt and IP6K2wt at higher concentrations.

      In a previous study, we could show that reduced intracellular 5PP-InsP5 levels lead to a decrease in rRNA synthesis (https://doi.org/10.1101/2022.11.11.516170). We now repeated this experiment with FMP-201300, along-side the known IP6K inhibitors TNP and SC-919, and could show that FMP-201300 it is able to reproduce this phenotype, strongly suggesting it is capable to diffuse through the cell membrane and act on IP6Ks. We have included this data as a new Figure (Figure S10) and in the discussion part of the manuscript.

      2) Did the authors try docking studies to gain insight into the binding site of FMP-201300?

      The reviewer raises an important point, and we indeed strongly considered docking studies during the progress of the project. However, given that the HDX-MS data show that the region around the αC-helix becomes much more flexible upon introducing the gatekeeper mutation, we were concerned that docking studies (which would be based on the static wt structure) may not accurately reflect the more dynamic state of the mutated IP6K.

      Upon consulting with our colleagues with expertise in docking and molecular dynamics simulations, we believe that MD simulations would need to be performed to obtain a more realistic picture of this protein ligand interaction, which we would like to pursue in the future.

      3) Regarding the SAR, it would be useful to know if both carboxylic acids are required for allosteric inhibition.

      Given the available data, it appears very likely that both carboxylic acids are required for the inhibitor to unfold its potency. Compound A2, which only contained one carboxylate group, showed drastically reduced potency. We have altered the text in the main manuscript to get this point across more clearly.

      4) It would be helpful if the authors presented a model for how they think the Leu210 to Valine mutation sensitizes IP6K1 to FMP-201300.

      We agree that it is important to better visualize the structural factors that play a role in the sensitization towards the compound. We have generated a new Figure 5 (and the old Figure 5 is now Supplementary Figure 9), and added a section to demonstrate how we propose the mutation leads to the sensitization of IP6K1 to FMP-201300. For a better understanding, we have also included a depiction how the mutation already affects the apo structures. Furthermore, we have added some text in the HDX section, to better describe the proposed mechanism.

      Minor:

      1) Figure 4: The authors should use the same units in panels a and b.

      Thank you for pointing this out, the figure was edited accordingly.

      2) In the supplementary Excel file, it would be helpful to include a tab that contains a legend.

      A contents page was added to help describe the layout of the supplementary Excel file.

      Reviewer #2:

      Overall, this is an excellent study of high quality. The identified FMP-201300 has the potential for further compound and probe development. My only minor comment is that the authors could spend more time discussing the proposed allosteric binding mode of FMP-201300 and provide more detailed figures to highlight the proposed interactions with the protein and the conformational changes that must ultimately take place to accommodate the allosteric modulator. I appreciate that the co-crystallization experiments did not yield bound inhibitor structures, but perhaps the authors could consider MD simulations to complete their study. However, that could be a story in itself and should not be a must for the publication of this great work.

      We agree with the reviewer (and also reviewer 1) that it is important to better visualize the structural factors that play a role in the sensitization towards the compound. We have generated a new Figure 5 (and the old Figure 5 is now Supplementary Figure 9), and added a section to demonstrate how we propose the mutation leads to the sensitization of IP6K1 to FMP-201300. For a better understanding, we have also included a depiction how the mutation already affects the apo structures. Furthermore, we have added some text in the HDX section, to better describe the proposed mechanism. In brief, we propose that the mutation leads to increased flexibility of the region in the mutation, allowing accommodation of FMP-201300 and ATP. These same regions are also the regions that have large decreases in deuterium exchange upon addition of the inhibitor.

      We also appreciate the comment about using computational methods, to predict the binding site (also a remark from reviewer 1). We strongly considered docking studies during the progress of the project. However, given that the HDX-MS data show that the region around the αC-helix becomes much more flexible upon introducing the gatekeeper mutation, we were concerned that docking studies (which would be based on the static wt structure) may not accurately reflect the more dynamic state of the mutated IP6K. As the reviewer points out, MD simulations would likely be needed to obtain a more realistic picture of this protein ligand interaction, which we would like to pursue in the future.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      Overall, I quite enjoyed reading the manuscript and found it very well-structured and organized. I congratulate the authors for building this nice research. I do have a few major points to raise, but probably they would not affect the general message of the manuscript.

      Thank you for taking the time to review our manuscript and the positive feedback. Following your suggestions, we have corrected some mistakes and added clarifications and a few of the suggested quality checks on the models. However, we decided not to run new analyses as: i) we believe there would be minor changes to the general message of the manuscript; and ii) while some suggested analyses are compelling, they are difficult to implement for different reasons or are outside the scope of the paper (clarified below).

      I was confused about how IUCN data were used. The IUCN predictors are not mentioned in the model equations presented in the manuscript, but their effect size is reported in Figure 2.

      Thank you for highlighting this issue. This was a typo: we forgot to mention the variable in both equations 1 and 2. Changed accordingly.

      In the manuscript Methods, it is said that IUCN data was classified into 3 categories. I believe there was a mix of mechanisms in measuring it this way since at least two processes might be underlying IUCN data. First, one can inspect whether there is an effect on "scientific/societal interest" for assessed vs non-assessed species. This would not have any relationship with the assessed status itself. Assessed species are any with LC, NT, VU, EN, CR, EW, EX statuses, whereas non-assessed species might include DD and NE. Second, one may observe an effect of threat status itself, with threatened species being more researched than non-threatened species, this would only be possible for assessed species, although there are methods out there to impute missing statuses. By inspecting Figure 2, I got the feeling that only the second option was explored, but this would need to be confirmed.

      We couldn’t test the effect of single categories (LC, NT, VU, EN, CR, EW, EX) because observations within factor levels were unbalanced. So, we re-grouped the different categories into three levels: “Threatened” (EX, EW, CR, EN and VU), “Non-Threatened” (NT and LC), and “Unknown” (DD and NE) and only tested this variable (your second option). Note that the effect size of the level Unknown is not shown in Figure 2 as this is the reference category. This is clarified in the caption of Figure 2.

      In Figure 2, I was confused about the presence of three categories of domain. In the text, it states that four categories have been used. I believe these domains are non-mutually exclusive, that's why there is a fourth category. Would it not be better to assess the influence of domain through three dummy variables (terrestrial, marine, freshwater), where multiple presences (1's) would indicate the "multiple" category?

      We opted for a categorical variable (rather than a dummy) to have the same number of variables in the two groups (‘species’ vs ‘culture’). This is needed for the variance partitioning analysis (VPA), because an unbalanced number of variables in one group of a VPA can artificially inflate R2 (see, e.g., this source: https://www.davidzeleny.net/anadat-r/doku.php/en:varpart). As for Figure 2, the level “Multiple”, being the reference category, is not shown. This is clarified in the caption: “Baseline levels for multilevel factor variables are: Domain [Multiple]”.

      At present, I felt that the spatial components of your data were unexplored. Since you have centroids representing species distribution, it could be interesting to explore the presence of the species within protected areas or biodiversity hotspots. That might be something triggering at least scientific interest. Also, one can derive information about the major habitat of species occurrence (either using IUCN Major Habitat classification) or extracting overlap of species centroids with WWF biomes (e.g., simplified to just forested vs non-forested habitats; https://ecoregions.appspot.com/). Another point very common to research exploring biodiversity shortfalls is the proximity to research institutions (https://doi.org/10.1111/2041-210X.13152). And since societal interest is also being explored, what about the proximity to major cities (doi:10.1038/nature25181). Finally, other metrics derived from species centroids could inform "tropicality", if the species is tropical or not. Most often, the tropics species are neglected in comparison with those occurring in temperate regions.

      We thank the reviewer for this suggestion, and we are aware that there are important spatial drivers of interest as highlighted in earlier research. Indeed, the spatial aspects of the data were somewhat underexplored as a deliberate choice because we hope to carry out additional work to explore these aspects in more detail. Nevertheless, we included the centroid of each species range as a broad proxy of its distribution, to help explore, for example, the role of species latitudinal distribution in driving interest metrics. We have also considered the suggestions provided as additional analyses, but we find these may be challenging to implement with the current data for a few reasons. First, each species centroid was calculated based on GBIF occurrences and therefore represents the midpoint of all locations, but not necessarily an area that is known to be occupied by the species. Using the centroid to assess whether a species is located in a given biome or within protected areas using this approach would therefore be potentially misleading (for example, for some terrestrial species it may fall in the sea, and vice versa). Also, for the same reasons, taking the centroid to estimate the species accessibility or proximity to research institutions may be misleading. We find that while important, these spatial aspects require a more nuanced approach to be explored in detail.

      I was also thinking about the influence of time on the models. Species described long ago are often more known to people and scientists and had more "time" to be researched. Although metrics of societal interest were restricted to the last decade here, that does not necessarily mean that peoples' interest is not affected by their accumulated experiences. Similar reasoning applies to scientific interests, which have a lengthier time frame (~80 years). That said, the year of description or time since description could be added to capture some metric of time.

      This is a good point, which we discussed prior to running the analysis. Indeed, there is evidence that such accumulated experiences can drive species interest as our own research has also previously highlighted (e.g. see Ladle et al. 2017 doi: 10.1002/pan3.10053). However, we felt that comparing the date of description as a proxy of accumulated human experiences with species was only fair within major biological groups and not between them. This is because taxonomic practices, definitions, and methods vary widely between biological groups. We therefore decided not to include time since description as a variable driving the measures of scientific and societal interest in this work. Nevertheless, we recognize the importance of the history of such experiences in driving human interest in species, and the consequences emerging from the loss of such links, and have thus included a brief discussion of this topic in the manuscript (see lines 177-182).

      Model residuals could be checked for phylogenetic or spatial autocorrelation. I am aware there is no phylogenetic tree used, but the hierarchical taxonomy could be used (Phylum / Class / Order / Family / Genus) as a proxy for phylogenetic relationship.

      We agree. Indeed, the hierarchical taxonomy was already included as a random factor (Phylum / Class / Order) in eq. 1. Note that we excluded Family and Genus from the random structure because in most Phyla a single genus and family has been sampled (as well as due to model convergence problems).

      Concerning the spatial autocorrelation, one could check whether model residuals and their respective coordinate centroids of each species range. It is stated that GLMM has been used to avoid these non-independence issues, but it would be interesting to check whether residuals remained free of them.

      Good suggestion, although the use of centroids may not be the most appropriate since it is only a rough approximation of each species distribution (see previous answer). Still, out of curiosity, we checked whether the random factor on biogeography was enough to capture residual spatial autocorrelation in the models. For this, we used the R package DHARMa, which performs a Moran's I test for distance-based autocorrelation. Given that some coordinates were duplicated, we grouped residuals by biogeographic regions (DHARMa requires all coordinates to be unique). Neither the Web of Science nor the Wikipedia models had spatial autocorrelation in the residuals:

      Web of Science model: observed = –0.20482, expected = –0.14286, sd = 0.10682, p-value = 0.561

      Wikipedia model: observed = –0.180820, expected = –0.142857, sd = 0.055513, p-value = 0.4941

      A last point, it would be interesting to provide some sort of inset plots, such as barplots or donut plots (within the current plots), showing the proportion of species with respect to major clades and biogeographical regions.

      This is a good suggestion, but we couldn’t find a good way to show this as an inset. We added a barchart showing the number of species in each Phyla/Division in the supplementary materials (Figures S2C). As for the proportion of species in each region, we thought it would be redundant with Figure S1 (summarizing spatial information in sampled species).

      Reviewer #2 (Public Review):

      Using standard and widely used tools, the authors revealed the factors (cultural, phenotypic, phylogenetic, etc.) shaping societal and scientific interest in natural species around the globe. The strength of this manuscript (and the authors') lies in its command of the available literature, database and variable management and analysis, and its solid discussion. The authors thus achieved a manuscript that was pleasant to read.

      Thank you for taking the time to review our manuscript and the positive feedback.

      While I agree that doing a global study requires losing details of local patterns, maybe this is exactly the biggest shortcoming of the manuscript, oblivious to how different cultures (compare USA to PNG, for example) are reflected in these global patterns.

      Related to this previous point, my only other comment is about using English as a reference of societal interest (i.e., the presence of a common name in English). While English may be widespread in Academia, it is still not that common in other societal circles, especially those not using Wikipedia for lack of internet access.

      We acknowledge the limitation of this choice, as well as our limited capacity to represent specific cultural contexts with our approach. Our decision to consider only the existence of English common names as a variable was partly driven by practical reasons, and partly by the very factors the reviewer highlights. Indeed, many cultures, communities and social circles do not use English frequently and also do not use the internet frequently. One consequence of this is also that the information compiled for species in other languages is more restricted than that available in English, including the existence of vernacular names. In languages other than English, it may even be the case that several common language names exist in reference to the same species, and this number may be an even better reflection of their cultural importance, but sadly this information is not comprehensively indexed across languages and biological groups which prevented us from considering it. On the other hand, most species have been attributed English common names as part of legislative, scientific and other societal processes, and it is therefore likely that if they are important in any specific cultural setting, they will probably also have a vernacular English language name. Ultimately, while we recognize the potential limitations of this decision, we felt that considering English common names was the simplest and less biassed approach to represent the degree with which a species is individually recognized nowadays. We now better expose the reasons for the decision to consider only English common names, and the limitations associated with it in the manuscript (lines 178-193).

    1. Author Response

      eLife assessment

      This study reports the fundamental discovery of a novel structure in the developing gut that acts as a midline barrier between left and right asymmetries. The evidence supporting the dynamics, composition, and function of this novel basement membrane in the chick is in parts solid and in others convincing, but the investigation of its origin and impact on asymmetric organogenesis is not yet conclusive. This careful work is of broad relevance to anyone interested in patterning mechanisms, the importance of the extracellular matrix, and laterality disorders.

      We extend our sincere gratitude to the editors at eLife for their meticulous evaluation of our manuscript, as well as the valuable insights shared in this Public Review. We also wish to convey our appreciation to the reviewers for their thought-provoking suggestions, which we are enthusiastic about integrating into our revised work. In this provisional response, our primary focus is to address the two main concerns raised: the necessity for functional data to elucidate the importance of the barrier, and the imperative to resolve uncertainties regarding its origin. We are dedicated to addressing these important points, and believe they will greatly enhance the quality and significance of our manuscript.

      Joint Public Review:

      When the left-right asymmetry of an animal body is established, a barrier that prevents the mixing of signals or cells across the midline is essential. Such a midline barrier preventing the spreading of asymmetric Nodal signaling during early left-right patterning has been identified. However, midline barriers during later asymmetric organogenesis have remained largely unknown, except in the brain. In this study, the authors discovered an unexpected structure in the midline of the developing midgut in the chick. Using immunofluorescence, they convincingly show the chemical composition of this midline structure as a double basement membrane and its transient existence during the left-right patterning of the dorsal mesentery, which authors showed previously to be essential for forming the gut loop and guiding local vasculogenesis. Labelling experiments suggest a physical barrier function, to cell mixing and signal diffusion in the dorsal mesentery. Cell labelling and graft experiments rule out a cellular composition of the midline from dorsal mesenchyme or endoderm origin and rule out an inducing role by the notochord. Based on laminin expression pattern and Ntn4 resistance, the authors propose a model, whereby the midline basement membrane is progressively deposited by the descending endoderm.

      Laterality defects encompass severe malformations of visceral organs, with a heterogenous spectrum that remains poorly understood, by a lack of knowledge of the different players of left-right asymmetry. This fundamental work significantly advances our understanding of left-right asymmetric organogenesis, by identifying an organ-specific and stage-specific midline barrier. The complexities of basement membrane assembly, maintenance, and function are of importance in several other contexts, as for example in the kidney and brain. Thus, this original work is of broad interest.

      Overall, reviewers refer to a strong and elegant paper discovering a novel midline structure, combining classic but challenging techniques, to show the dynamics, chemical, and physical properties of the midline. However, reviewers also indicate that further work will be necessary to conclude on the origin and impact of the midline for asymmetric organogenesis. Three issues have been raised to strengthen the claims:

      1) The function of the midline as a physical barrier requires clarification. Dextran injection here seems to label cells and not the extracellular space. By counting the proportion of dextran-labeled cells rather than dextran intensity itself, the authors do not measure diffusion per se, but rather cell mixing.

      We agree that an additional means of showing the barrier function is important. We are currently addressing this using a fluorescently tagged derivative of the drug AMD3100 that we recently synthesized, per Poty et al. 2015. We previously showed that AMD3100 perturbs left sided CXCR4-dependent vasculogenesis when introduced on the left side of the dorsal mesentery (DM), but not when introduced on the right (Mahadevan et al. 2014). These data suggest that a midline barrier prevents diffusion of AMD3100 across the DM. We are currently characterizing the extracellular diffusion of this fluorescent derivative through the DM to complement our previous dextran data.

      Additionally, we should emphasize that the dextran-injected embryos shown in Fig. 6 D-F were isolated two hours post-injection, a timeframe insufficient for cell migration to occur across the DM (Mahadevan et al., 2014). We also collected additional post-midline stage embryos ten minutes after dextran injections - too short a timeframe for significant cellular migration (Mahadevan et al., 2014). Importantly, the fluorescent signal in those embryos was comparable to that observed in the embryos in Fig. 6. Thus, we believe the movement of fluorescent signal across the DM when the barrier starts to fragment (HH20-HH23) is unlikely to represent cell migration. More than a decade of DNA electroporation experiments of the left vs. right DM by our laboratory and others have never indicated substantial cell migration across the midline (Davis et al., 2008; Kurpios et al., 2008; Welsh et al., 2013; Mahadevan et al., 2014; Arraf et al. 2016; Sivakumar et al., 2018; Arraf et al. 2020; and Sanketi et al., 2022). This is also shown in our current GFP/RFP double electroporation data in Fig. 2 G-H, and DiI/DiO labeling data in Fig. 2 E-G. Collectively, our experiments suggest that the dextran signal we observed at HH20 and HH23 is likely not driven by cell mixing.

      2) The descending endoderm zippering model for the formation of the midline lacks direct evidence. The claim of an endoderm origin is based on laminin expression, but the laminin observed in the midline with an antibody may not necessarily correspond to the same subtype assessed by in situ hybridization.

      We have attempted to address this important issue by introducing several tagged laminin constructs, LAMB1-GFP, LAMB1-His, and LAMC1-His, to the endoderm via DNA electroporation to try to label the source of the basement membrane. However, despite endogenous laminin production and export within the endoderm, there appeared to be no export of any of the tagged proteins to the endodermal basement membrane. This experiment was further complicated by the necessarily large size of these constructs at 10-11kb due to the size of laminin subunit genes, resulting in low electroporation efficiency. Although we have not yet determined an alternative way to directly test the endodermal origin hypothesis, we are committed to exploring specific methods to help us test this in future experiments.

      The midline may be Ntn4 resistant until it is injected in the relevant source cells.

      Ntn4 has been shown to disrupt both nascently assembling and preformed mature basement membranes (Reuten et al., 2016). As such, we feel that this particular membrane’s resistance to degradation is likely not predicated by its stage of assembly.

      Alternative origins could be considered, from the bilateral dorsal aortae or the paraxial mesoderm, which would explain the double layer as a meeting point of two lateral tissues.

      We agree that alternate origins of the midline basement membrane cannot be ruled out from our existing data. We have indeed considered the bilateral dorsal aortae and the paraxial mesoderm as possibilities. However, at the earliest stages of midline basement membrane emergence, the dorsal aortae are already significantly distant from the nascent basement membrane, as are the somites, which have not yet undergone epithelial-to-mesenchymal transition. Fig. S2 G provides an example of a very early midline basement membrane without dorsal aortae or somite contact. Because this particular image is from a section that is fairly posterior in the HH12-13 embryo, it is thus less developed in pseudo-time and gives a window on midline formation in even earlier stage embryos. This is in contrast to the spatially close relationship of the midline basement membrane with the notochord and endoderm. In the context of potential dorsal aortae contributions, it is worth noting that the basement membrane of vascular endothelial cells has a distinct composition from a non-vascular basement membrane. For example, vascular endothelial cells produce only alpha 4 and alpha 5 laminin subunits but contain no alpha 1 subunit in any known species (reviewed in DiRusso et al., 2017). Thus, endothelial cell-derived basement membranes would not contain the alpha 1 laminin subunit that we used in our studies as a robust marker of the midline basement membrane. Note in Fig. 3 E-H and J-J’’’ the absence of dorsal aortae labeling using our laminin alpha 1 antibody. The dorsal aortae are also richer in fibronectin, as seen in Fig. S2, while the midline ECM exhibits far less fibronectin staining. While it may be possible that the converging aortae compress the midline ECM into a more compact structure, we feel direct contribution of basement membrane components is unlikely.

      3) The title implies a role of the midline in left-right asymmetric gut development. However, the importance of the midline is currently inferred from previously published data and stage correlations and will require more direct evidence.

      We agree that we have not fully and directly demonstrated the extent of the role of the midline in enabling the asymmetry of DM compartments during gut development. We propose the following revised title: “An atypical basement membrane forms a midline barrier during left-right asymmetric gut development”. It is important to note that we have made diligent efforts to investigate the functionality of the midline basement membrane through various methods in which we are highly experienced. However, while targeting either the left or right side of the DM is relatively straightforward, accessing the midline presents substantial challenges. We attempted physical perturbation using in vivo laser ablation, but we observed no significant effect or stable disruption of the midline. Additionally, our attempts at ablation using diphtheria toxin proved to be too harsh on the endoderm, preventing reliable and consistent data interpretation. We have tried electroporating MMP9 and MMP2 into the DM, but these did not produce any appreciable effect on the midline. We are also concerned that directly injecting MMPs or other enzymes may lead to injection-related tissue damage to the embryo that may be difficult to separate from direct MMP digestion of the matrix. However, we firmly believe that our inference regarding the involvement of the midline ECM in the asymmetry of DM compartments is robust, based on the functionally distinct yet closely positioned cell populations of the DM, and the timing of the midline in relation to the establishment of these asymmetric compartments. Notably, recent research conducted in our laboratory has highlighted the vital necessity of maintaining the separation of diffusible signaling molecules, such as Bmp4, from these neighboring cell populations, which would otherwise be in direct contact if not for the presence of the midline basement membrane (Sanketi et al., 2022). We will continue developing specific methods to perturb the midline in preparation of a revised manuscript.

    1. Author Response

      The following is the authors’ response to the previous reviews

      We thank the Reviewers and Editors for the evaluation of our revised manuscript.

      We especially value the careful assessment of Reviewer 1; at the same time, we clearly disagree with the reviewer’s statement that the revised manuscript “is essentially unchanged”. As appreciated by the other Reviewers, we performed a key experiment (in our opinion the only conclusive experiment) to further solidify that FK506-treatment kills parasites in a FK506-independent manner. Of note, however, Reviewer 1 made us aware of an error in the legend of Figure 4F, which likely contributed to the confusion regarding the antiplasmodial effect of FK506: Unfortunately, we missed updating this legend to appropriately imbed the new experiment. We therefore incorrectly stated that parasites were exposed to FK506 for 48 hours after FK506 treatment at 4-10 hpi and 36-42 hpi in G1. In contrast to the experiments described in the initial submission, parasite survival was not measured 48 h later, but in G2 ring stage parasites, i.e. at a time point during which parasitemia is not affected by the knockout of PfFKBP35. We have now corrected this. As pointed out correctly by Reviewer 1, it would otherwise not be possible to disentangle the effects of the gene knockout and the drug. The setup we now present in Figure 4F, however, is clearly able to do so.

      We apologize for the inaccuracy and hope this resolves the ambiguities regarding the FKBP35-independent antimalarial effect of FK506. In line with the comments of Reviewers 2 and 3, we believe that our findings on FK506 activity are of particular importance for the malaria research community. We therefore hope that the final eLife assessment will reflect this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      On behalf of my co-authors, we thank you very much for giving us the opportunity to revise our manuscript entitled “A positive feedback loop between ZEB2 and ACSL4 regulates lipid metabolism to promote breast cancer metastasis” (manuscript number: eLife-RP-RA-2023-87510).

      We would like to convey our appreciation to you and the expert reviewers for your valuable time and effort in reviewing and improving our work. We are grateful for the constructive comments raised by the six expert reviewers. We have studied the reviewer’s comments carefully and have accordingly conducted additional experiments as recommended. We have made the following revisions point by point. We found that our work was substantially strengthened by addressing these points.

      Reviewer #1 (Public Review):

      In this study, Jiamin Lin et al. investigated the potential positive feedback loop between ZEB2 and ACSL4, which regulates lipid metabolism and breast cancer metastasis. They reported a correlation between high expression of ZEB2 and ACSL4 and poor survival of breast cancer patients, and showed that depletion of ZEB2 or ACSL4 significantly reduced lipid droplets abundance and cell migration in vitro. The authors also claimed that ZEB2 activated ACSL4 expression by directly binding to its promoter, while ACSL4 in turn stabilized ZEB2 by blocking its ubiquitination. While the topic is interesting, there are several major concerns with the study and its conclusions are not convincing.

      1) Figure 1A, the clinical relevance or biological significance of drug-resistant luminal breast cancer cell lines with metastatic cancer is questionable. Additionally, the RNA-seq analysis lacked multiple test correction for differential gene expression analysis, and no fold-change cut-off was used, leading to incorrect thresholds and wrongly identified significant signals.

      We appreciate the reviewer’s valuable questions to improve our manuscript. We identified many EMT related transcription factors such as ZEB2, SNAIL, TWIST, etc. was up-regulated in drug-resistant cells, so we hypothesized that drug-resistant cells may undergone EMT and acquire metastatic capability. The drug-resistant cells used in this study had already been proved and examined in the previous studies of our research team as follows:

      (1) Zheng FM, Long ZJ, Hou ZJ et al., A novel small molecule aurora kinase inhibitor attenuates breast tumor-initiating cells and overcomes drug resistance. Mol Cancer Ther. 2014 Aug;13(8):1991-2003.

      (2) Yang N, Wang C, Wang Z, et al., FOXM1 recruits nuclear Aurora kinase A to participate in a positive feedback loop essential for the self-renewal of breast cancer stem cells. Oncogene. 2017 Jun 15;36(24):3428-3440.

      For the second question, we used the fold-change cut-off in RNA-seq analysis and the fold change was over 1.5-fold and the adjust P value is less than 0.05. To make it more clearly, we have reset the cut off with a |log2FC|2 and p<0.05 and generated the volcano Plot using R4.3.0 software for differentially expressed genes as follows in Author response image 1. The results showed 3217 and 3035 up-regulated genes in TAXOL-resistant and EPI-resistant cells respectively, along with 2427 (TAXOL) and 2901 (EPI) down-regulated genes. Both ACSL4 and ZEB2 were up-regulated in two cell lines. We have put the figure in the new supplementary Fig S2.

      Author response image 1.

      2) Figure 1D-E, the clinical associations between ACSL4 and ZEB2 overexpression and poor patient survival are not justified. The authors used an old web tool, the Kaplan-Meier plotter database, based on microarray data, to perform the analysis. The reviewer repeated the analysis and found that multiple microarray probes for ZEB2 were available, leading to opposite results when different probes were selected. The reviewer also repeated the analysis using more reliable TCGA RNA-seq data and found no correlation between ASCL4 or ZEB2 expression and post-progression survival.

      We appreciate the reviewer’s thoughtful questions. The Kaplan-Meier plotter database (http://kmplot.com/analysis/) we used is handled by a PostgreSQL server, which integrates gene expression and clinical data simultaneously including GEO, EGA and TCGA data. We used auto-select best cutoff for the the Kaplan-Meier analysis. Due to the web tool is old, we repeated the Kaplan-Meier survival analysis using R4.3.0 software and split the patients in TCGA database according to the third quartile expression (new Fig. 1D-F). The results also show that patients with high expression of ACSL4 and/or ZEB2 have relatively worse prognosis as follows in Author response image 2 (p<0.01):

      Author response image 2.

      3) Figure 1I relied on IHC to support the negative correlation between ACSL4 and Erα expression, but the small sample size limits the power to establish the relationship and the results are not definitive without further replication or biological investigation. The authors should provide more detailed and comprehensive analysis, including appropriate statistical tests, to ensure the findings are robust and reliable.

      We appreciate the reviewer’s suggestion. To better understand the positive correlation between ACSL4 and ZEB2 expression, we add up to 45 breast cancer cases for IHC analysis and the correlation is shown as follows in Author response image 3 (new Fig. 1 H):

      Author response image 3.

      4) Figure 3B-C lacks justification of the differences by showing only one field without any internal control for exposure. The reviewer suggests to show additional fields where cells with both efficiently and inefficiently knocked-down are present, to justify the robustness of the results. This can also be achieved by mixing control and knockdown cells.

      We totally understand the reviewer's concern. Thank you for pointing out this problem. The lower magnification field of view is shown as follows and it includes both efficiently and inefficiently knocked-down cells. We have changed the Fig. 3B and C as follows in Author response image 4:

      Author response image 4.

      5) Figure 4A-D, oleate-induced cell migration is a well-documented feature across different cancer types. To make it more relevant to the current study, the authors should examine multiple cell lines with high and low ZEB2/ACSL4 expression to determine the underlying relevance.

      We appreciate the reviewer’s comments and performed the suggested experiments. To better determine the role of oleic acid and ACSL4 on cell migration, we use MCF-7 cell line, which has low ZEB2/ACSL4 expression, to test the influence of oleate on the cell migration. Transwell and Wound healing assays revealed that oleic acid treated MCF-7 cells also exhibited enhanced invasive and metastatic capacities compared with control cells. This results indicates that oleate induces cell migration in MCF-7 cells may via mechanisms other than ACSL4. We have added the results to the new Supplementary Fig. 8 as follows in Author response image 5.

      Author response image 5.

      6) Figure 4E, it is difficulty to conclude that cancer cells utilize stored lipids during migration to fuel metastasis based on current data. Do you see any evidence of lipid signal decreasing in the leading edge of the scratch wound-healing migration assay? The authors should also compare signals between unmigrated and migrated cells in the transwell assay.

      We appreciate the reviewer’s constructive suggestion. We performed the wound-healing migration assay and observed that the lipid signal was obviously decreased in the leading edge of the scratch, as shown in the Author response image 6 (New Fig. 4E). In the transwell experiment, the cells which migrated to the lower side of the chamber after 24 hours showed decreased lipid signals (Fig. 4F). All these results indicates that lipid is utilized during migration.

      Author response image 6.

      7) Figure 6 warrants a genome-wide ChIP-seq to justify direct regulation of ASCL4 promoter by ZEB2. The reviewer’s analysis of publicly available ZEB2 ChIP-seq in multiple cell types detected no ZEB2 binding signaling within {plus minus} 5 kb of ASCL4 promoter.

      We thank the reviewer for the concern. We found that the breast cancer cells are not included in some data base, such as Cistrome Data Browser, which is a resource of human and mouse cis-regulatory information derived from ChIP-seq, DNase-seq and ATAC-seq chromatin profiling assays. Due to that different cell type may have totally different mechanisms, that’s why the ZEB2 binding signaling cannot be found within ASCL4 promoter in some cells.

      We searched JASPAR data base (https://jaspar.genereg.net/), which is an open-access database of non-redundant transcription factor (TF) binding profiles, and found the consensus binding sequences (CACCT) of ZEB (zinc finger E-box binding homeobox) transcription family were within the 2kb of ASCL4 promoter as follows in Author response image 7.

      Author response image 7.

      8) Figure 7 presents a series of self-contradictory results. Figure 7C, why no significant change in ZEB2-MYC expression was observed in the presence of ACSL4 and/or HA-Ubi? In Figure 7 E&G, why robust ACSL4 expression is present in the control group in E but not in (G)? Additionally, why there is no degradation in ZEB2 baseline level over time in the shACSL4 group in E? These raise severe concerns about the data quality.

      We appreciate the reviewer to point out these problems.

      Response to question 1: In fig. 7C, we used 293T cell for the ubiquitin assay and it is not a breast cancer cells. The efficiency of over-expression is different between ZEB2 and ACSL4 in 293T cell lines.

      Response to question 2: Because the expression of ACSL4 is low in MCF-7 and is high in MDA-MB-231 cells. In Figure 7E (New Fig. 7G), we used MDA-MB-231 cells for the control and ACSL4 knockdown cells. In Figure 7G (New Fig. 7I), we used MCF-7 cells for the control and ACSL4 over-expressed cells. We have also revised the figure legend of Fig.7 as follows:

      I, The stability of ZEB2 protein was detected by CHX treatment assay in control or ACSL4 over-expressed MCF-7 cells. GAPDH was used as the internal loading control.

      Response to question 3: Because knockdown of ACSL4 also significantly decreased the mRNA level of ZEB2 (New Fig. 7A), so the baseline levels of ZEB2 in the shACSL4 group (New Fig. 7G) were very low and degradation is not obvious.

      9) Figure 7D, the IP result of ACSL4 is not justified as there is no enrichment of ACSL4 in the IP compared to input. With the current data, it is hard to justify that there is any direct interaction. Moreover, based on IF data in Figure 3B-C, ACSL4 is exclusively localized in the cytoplasm, while ZEB2 is exclusively localized in the nucleus. It is hard to believe there is any direct interaction and mutual regulation.

      We appreciate the reviewer’s thoughtful questions. We have repeated the IP assay and found that the enrichment of ACSL4 was observed in the IP process and added to new Fig. 7E as follows in Author response image 8. We also repeated the immunofluorescence assay in the MDA-MB-231 cells. We observed that ZEB2 can also be found in the cytoplasm and co-localized with ACSL in some certain regions of the cytoplasm as follows in Author response image 9 (Supplementary Fig. S11):

      Author response image 8.

      Author response image 9.

      Reviewer #2 (Public Review):

      In this study, the authors validated a positive feedback loop between ZEB2 and ACSL4 in breast cancer, which regulates lipid metabolism to promote metastasis.

      Overall, the study is original, well structured, and easy to read. Despite the reliability of the data discussed in this article, there are still some deficiencies that need to be addressed through further explanation.

      Major issues:

      1) The authors demonstrated that ACSL4 regulates ZEB2 not only via a post-transcriptional mechanism but also via a transcriptional mechanism. The authors have not provided a comprehensive explanation of the specific mechanism in this paper. Therefore, it is recommended that the author delve into the potential mechanisms in the discussion section. For example, related mechanisms affecting ZEB2 ubiquitination degradation, as well as factors affecting ZEB2 upstream transcriptional regulation, etc.

      We appreciate the positive comments and constructive suggestion from the reviewer. We have added the following paragraph in the second paragraph of the discussion section :

      Interestingly, our RNA-seq data revealed that some ubiquitin E3 ligases, such as FBXO4, UBE3C, NEDD4, RBX1 etc. were significantly reduced in ACSL4 knockdown cells (Fig. S12). This result indicated that ACSL4 may reduce the ubiquitin of ZEB2 via down-regulating ubiquitin E3 ligase. Additionally, we found that ACSL4 promoted ZEB transcription as the mRNA level of ZEB2 was significantly reduced after ACSL4 knockdown. A recent study reported that LD-derived lipolysis provide acetyl-CoA for the epigenetic regulation of gene transcription. We observed that ACSL4 can also promote FAO, which generates acetyl-CoA for the epigenetic regulation. It is likely that ACSL4 regulates the ZEB2 mRNA level via lipid-epigenetic reprogramming mechanism, which is worth studying in the future.

      2) To further clarify the interaction of ZEB2 and ACSL4, it is best to perform in vitro glutathione-S-transferase (GST) pulldown assay and immunofluorescence assay.

      We appreciate the reviewer’s suggestion. We performed GST pull-down assay to examine whether ZEB2 and ACSL4 form a complex. GST pull-down assay confirmed the interaction of ZEB2 and ACSL4 as follows in Author response image 10 (Supplementary Fig. S10). We also performed immunofluorescence assay and found that ZEB2 was co-localized with ACSL in some certain regions of the cytoplasm as follows in Author response image 11. (Supplementary Fig. S11):

      Author response image 10.

      Author response image 11.

      3) In Figure 7B, the protein level of ZEB2 seems not to be altered in BT549 BCSC cell line after the depletion of ACSL4.

      We appreciate the reviewer to point out this problem. The protein level of ZEB2 in BT549 BCSC cell is not abundant as MDA-MB-231. We repeated the experiment and found that ZEB2 was reduced after the depletion of ACSL4 in BT549. We have replaced the Fig.7B as follows in Author response image 12:

      Author response image 12.

      4) EMT is characterized by changes in cell morphology, so the staining of cytoskeletons with Phalloidin is needed.

      We appreciate the reviewer’s suggestion and performed the staining. The results show that the ACSL4 knockdown cells had a significantly smaller length to width ratio, which indicates the reversion of EMT process, than those of the control group (p<0.05). We have put the results in Supplementary Fig. S4 as follows in Author response image 13:

      Author response image 13.

      5) Additional breast cancer cases or cohorts (such as TMA) should be used to validate the positive correlation between ACSL4 and ZEB2 expression through IHC analysis.

      We thank the reviewer for the suggestion. To better understand the positive correlation between ACSL4 and ZEB2 expression, we added more breast cancer cases up to 45 for IHC analysis and validated the positive correlation between ACSL4 and ZEB2. We have put the results into Fig 1 H and I as follows in Author response image 14:

      Author response image 14.

      Reviewer #3 (Public Review):

      The manuscript by Lin et al. reveals a novel positive regulatory loop between ZEB2 and ACSL4, which promotes lipid droplets storage to meet the energy needs of breast cancer metastasis. It is of interest, however, some concerns should be addressed to strengthen the finding.

      Major concerns:

      1) The effect of ZEB2 overexpression is not fully demonstrated in the whole study. This point should be addressed.

      We appreciate the positive comments and constructive suggestion from the reviewer. We have performed ZEB2 over-expressed MCF7 cell line. Over-expression of ZEB2 significantly enhanced the metastatic and invasive capacities of MCF7 cells. (Supplementary Fig. S5A and 5B).

      Author response image 15.

      1. Does the addition of oleate restore the ability of migration or invasion in ACSL4 knockdown cells?

      We thank the reviewer for the question. To address this point, the oleate was added in the culture medium of ACSL4 knockdown cells. As expected, the addition of oleate obviously restores the invasive and metastatic capacities of ACSL4 knockdown cells by 33.12% and 18.61% respectively. We have added the results in the new Fig. 4D as follows in Author response image 16:

      Author response image 16.

      3) Which cellular compartment does ACSL4 localize in and interact with ZEB2 to stabilize ZEB2?

      We thank the reviewer for the question. We have repeated the immunofluorescence assay in the MDA-MB-231 cells. We observed that ZEB2 can also be found in the cytoplasm and co-localized with ACSL in some certain regions of the cytoplasm (Supplementary Fig. S11):

      4) The ubiquitination assay and Co-IP assay are just performed in HEK293T cells. This result should be confirmed in MDA-MB-231 cells or Taxol-resistant MCF-7 cells.

      We appreciate the reviewer’s suggestion. We performed the ubiquitination assay and IP assay in MDA-MB-231 cells as follows. The results confirm that knockdown of ACSL4 obviously enhanced the ubiqutination of ZEB2. We have put the results into Fig. 7D and 7F as follows in Author response image 17:

      Author response image 17.

      5) How does ACSL4 regulate ZEB2 at the mRNA level?Please verify.

      We thank the reviewer for the thoughtful question. A recent study reported that LD-derived lipolysis provide acetyl-CoA for the epigenetic regulation of gene transcription. We observed that ACSL4 can promote FAO, which generates acetyl-CoA for the epigenetic regulation. It is likely that ACSL4 regulates the ZEB2 mRNA level via lipid-epigenetic reprogramming mechanism, which is worth studying in the future and we had added the following sentences into the second paragraph in the discussion section :

      Additionally, we found that ACSL4 promoted ZEB2 transcription as the mRNA level of ZEB2 was significantly reduced after ACSL4 knockdown. A recent study reported that LD-derived lipolysis provide acetyl-CoA for the epigenetic regulation of gene transcription. We observed that ACSL4 can also promote FAO, which can generate acetyl-CoA for the epigenetic regulation. It is likely that ACSL4 regulates the ZEB2 mRNA level via lipid-epigenetic reprogramming mechanism, which is worth studying in the future.

      6) In Fig. 2F, the silencing efficiency for ACSL4 and ZEB2 should be shown. In addition, the protein level of ZEB2 or ACSL4 in shZEB2 and shZEB2+ACSL4 groups should also be addressed.

      We appreciate the reviewer's suggestions. We have added the protein levels in Fig 2F and 2H as follows in Author response image 18:

      Author response image 18.

      7) What is the survival status of patients with both high expression of ACSL4 and ZEB2 in TCGA. In addition, more survival data from databases especially patients with both high expression of ACSL4 and ZEB2 are needed to analyze to support the finding.

      We thank the reviewer for the constructive suggestion. We repeated the Kaplan-Meier survival analysis of TCGA RNA-seq data by using R4.3.0 software. The survival data show that the patients with both high expression of ACSL4 and ZEB2 have the worst prognosis in the four groups (P<0.05) ( New Fig. 1D-F).

      Reviewer #1 (Recommendations For The Authors):

      10) Only one siRNA/shRNA was used for knockdown in one cell line. Different siRNAs/shRNAs and multiple cell lines should be used to rule out off-target effects.

      We appreciate the reviewer’s suggestion. We have test three siRNA and shRNA for the knockdown efficiency (negative control siRNA or ACSL4 and ZEB2 siRNA were from the company of GenePharma), we chose one sequence with the best knock-down effect.

      Author response image 19.

      11) Western blot data are required to justify the overexpression or knockdown efficiency of ACSL4 in cells in Figure 2 A-C.

      We thank the reviewer for the suggestion. we have added the following western blot data in Figure 2:

      Author response image 20.

      12) In Figure 1G, there is a huge variation of the protein input, which makes the results not justified. The authors should repeat the experiments to ensure consistency and reproducibility of the results.

      We appreciate the reviewer to point out this problem. Because this is the tissue samples of breast cancer patients. The results are affected by the tumor tissue composition between different patient sample, and it is difficult to obtain fresh tissues. In our paper, paraffin specimens have been used for IHC staining, and the results confirmed that ACSL4 and ZEB2 are positively correlated. We have put the results in the supplementary data.

      Reviewer #2 (Recommendations For The Authors):

      1) Data from Figure 1A showed the EMT transcription factor SNAIL was also among the top upregulated genes. Please explain why the association between ACSL4 and ZEB2 was studied instead of ACSL4 and SNAIL.

      We appreciate the reviewer’s question. We had calculated the correlation between the ACSL4 and SNAIL by Pearson’s correlation test. The correlation of ACSL4 and SNAIL is 0.33, less than that of ZEB2. Bedsides, the binding motif analysis reveals that the consensus sequence of ZEB transcription family is within the ACSL4 promoter. Thus, we investigated the relationship between ACSL4 and ZEB2 in breast cancer cells.

      Author response image 21

      2) What is the limitation of your study? Please add some relevant description in the part of discussion.

      We appreciate the reviewer’s suggestion. We have added the description of limitation of our study in the last paragraph of discussion section as follows:

      The limitation of this study is the clinical samples is only 45. The future study should expand the clinical samples and cases to provide more clinical evidence for the crucial role of ACSL4 in breast cancer metastasis.

      3). In Figure 3 Figure Legends part, the authors used the word "knockout", which is a description error.

      We appreciate the reviewer’s advice. We have corrected "knockout" into "knockdown".

      Reviewer #3 (Recommendations For The Authors):

      Minor concerns:

      1) In line 352-353, the statement about whether the high or low expression of ACSL4 and ZEB2 or the advanced breast cancer affects prognosis is inaccurate.

      We appreciate the reviewer to point out this problem. We have corrected the statement into “We found that patients with higher ACSL4 or ZEB2 expression, especially those with simultaneous high expression had worse prognosis than those with lower expression ”.

      2) The title of the seventh part of your results contains a logical error that is opposite to the experimental conclusion.

      We truly appreciate the reviewer to point out this problem. We have changed the title of the seventh part of results to “ACSL4 regulates ZEB2 mRNA expression and protein stabilization”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      1) Overall, this is a useful tool, the data is well-presented, and the paper is well-written. However, the predictions are only compared with two existing reconstruction tools though more have been recently published

      The aim of this work was to facilitate high-throughput generation of strain-specific metabolic models e.g. at the scale of 100s -1000s as indicated throughput the introduction (see lines 74-82, 91-94), and therefore we only compared tools which were capable of high-throughput analysis via command line and excluded others (e.g. merlin). We have now tested this against the other recent command line tool, gapseq which had escaped our gaze. Thank you for bringing this to our attention. Additionally, we have included KBase (ModelSEED, a web-based app that does not support high-throughput analysis) to allow for readers to interpret the results in the context of community standard approaches, since KBase is a popular tool.

      We have added an explicit statement about the choice of approaches, now at lines 194-199 as follows:

      “We compared the output and performance of Bactabolize to the two previously published tools that can support high-throughput analyses i.e. CarveMe (30) and gapseq (31). To aid interpretation in the context of community standard approaches, we also include a comparison to the popular web-based reconstruction tool, KBase (ModelSEED), and a manually curated metabolic reconstruction of K. pneumoniae strain KPPR1 (also known as VK055 and ATCC 43816, metabolic model named iKp1289) (15).”

      The methods section was updated accordingly (now lines 552-558):

      “A draft model was generated using gapseq version 1.2 with the ‘doall’ command using the unannotated genome (as gapseq does not take annotated input files). Gap-filling was subsequently performed using the ‘fill’ command and a custom M9 media file to match the nutrient list found in Bactabolize (https://github.com/kelwyres/Bactabolize/blob/main/data/media_definitions/m9 _media.json).

      Finally, a draft model was constructed using the annotated genbank K. pneumoniae KPPR1 file and the KBase narrative (15) (https://narrative.kbase.us/narrative/ws.14145.obj.1).”

      The results section (now lines 193-308) and Figure 2 have also been updated / restructured to reflect the new analyses, and include a comparison of the relative compute times for the construction of models (lines 281-291) as follows:

      “While model features and accuracy are essential metrics for comparison, computation time is also a key consideration for high-throughput analyses. We recorded the time required for each tool to build draft models for 10 of the completed KpSC genomes used in the quality control framework (see below) on a high-performance computing cluster (Intel Xeon Gold 6150 CPU @ 2.70GHz and 155 GB of requested memory on a CentOS Linux release 7.9.2009 environment. CarveMe KpSC pan was the fastest with a mean of 20.04 (range 19.90 - 20.18) seconds, followed by CarveMe universal at 30.28 (range 29.20 - 31.80) seconds, then Bactabolize KpSC pan at 98.05 (range 92.19 - 100.4) seconds. KBase took 183.50 (range 120.00 - 338.00) seconds per genome via batch analysis, including genome upload time and queuing. gapseq took 5.46 (range 4.55 - 6.28) hours to produce draft models (not including the required gap-filling), consistent with previous reports (37).”

      Finally, the whole discussion has been updated and substantially restructured (lines 472-474, 475-493, 494-512). Specific mentions to the new analyses are at lines:

      472-474: “Consistent with this assertion, our draft KPPR1 model constructed with KBase (without manual curation) was an outlier in terms of the very low number of genes, reactions and metabolites that were included.”

      475-493: “CarveMe with universal model (30) and gapseq (31) are the current gold standard automated approaches for model reconstruction, and we show that a draft KpSC model generated by Bactabolize with the KpSC pan v1 reference resulted in similar or better accuracy for phenotype prediction (Figure 2). Both the CarveMe universal and gapseq models resulted in high numbers of true-positive and true-negative growth predictions. However, these were also accompanied by comparatively higher numbers of false-positive predictions that resulted in a lower overall accuracy for substrate usage analysis compared to Bactabolize with the KpSC-pan v1 reference (Figure 2), and comparatively lower precision and specificity for the gene essentiality analysis. False-positive predictions may indicate that the relevant metabolic machinery are present in the cell but were not active during the growth experiments (e.g. due to lack of gene expression). In this regard, false-positives are not always a sign of model inaccuracy. However, false-positive predictions can also occur from incorrect gene annotations e.g. due to reduced specificity of ortholog assignment resulting from the use of the universal model without manual curation. Given a key objective here is to facilitate high-throughput analysis for large numbers of genomes, it is not feasible to expect that all models will be manually curated, and therefore we believe that identifying fewer genes with lower overall error rates provides greater confidence in the resulting draft models. We also note that the BiGG universal reference model which CarveMe leverages is no longer being actively maintained. In contrast, user defined reference models can be iteratively curated and updated to incorporate new knowledge and data as they become available.”

      510-512: “However, gapseq’s long compute time makes it inappropriate for application to datasets comprising 100s-1000s of genomes (such as have become increasingly common in the bacterial population biology literature).”

      2) My understanding is that the tool requires a set of reference reconstructions for other strains of the target species. If no reference reconstruction is available for another strain of the target species, can this species not be reconstructed?

      Any input reference can be used to generate models however, single strain models matching the target species, or ideally a species-specific panreference, are recommended for best results. We have added a discussion on these points at lines 128-133:

      “For optimum results we suggest using a pan-model that captures as much diversity as possible for the target species or group of interest, because Bactabolize’s reconstruction method is reductive i.e. each output strainspecific model will include only genes, reactions and metabolites that are present in the reference or a subset thereof (although novel genes, reactions and metabolites can be added via manual curation).”

      We expand on these points further in the discussion:

      494-512: “Bactabolize’s reference-based reconstruction approach is reductive, meaning the resultant draft models will comprise only the genes, reactions and metabolites present in the reference, or a subset thereof, and will not include novel reactions unless they are manually identified and curated by the user. This is an important caveat that should be considered carefully for application of Bactabolize to large genome data sets, particularly for genetically diverse organisms such as those in the KpSC. For optimum results we suggest using a curated pan-model that captures as much diversity as possible for the target species or group of interest. While we acknowledge that a reasonable resource investment is required to generate a high-quality reference, we have shown that a pan-model derived from just 37 representative strains can be sufficient to support the generation of highlyaccurate draft models (Figure 2 and 5). Additionally, we note that it is possible to use a single strain reference model, which should ideally represent the same or closely related species to that of the input genome assemblies, in order to facilitate accurate identification of gene orthologs. It is technically possible to use an unrelated reference model, but this is expected to result in inaccurate and/or incomplete outputs and has not been tested in this study. In circumstances were no high quality closely-related reference model is available, we recommend alternative reconstruction approaches that leverage universal databases e.g. CarveMe (30) or gapseq (31). However, gapseq’s long compute time makes it inappropriate for application to datasets comprising 100s-1000s of genomes (such as have become increasingly common in the bacterial population biology literature).”

      3) How do the reconstructions generated by Bactabolize compare to those generated by other reconstruction tools besides CarveMe and ModelSEED, e.g., gapseq (Zimmermann et al, Genome Biology 2021. 22:81) or merlin (Capela et al, Nucleic Acids Res 2022, 50(11):6052-6066?

      See response to rev 1 point 1.

      4) How are the accuracy, specificity, and sensitivity of the pan-models calculated? Is the compared experimental data on the species level?

      We used the pan-model as a reference from which we generated a strain-specific model for K. pneumoniae KPPR1 (using Bactabolize and CarveMe). This strain-specific metabolic model was then used to simulate growth phenotypes and compared to published experimental data for KPPR1. This was described in the methods section, including the calculations for the metrics (lines 589-593); however, we have also expanded the description within the results section to clarify the approach (lines 201-209):

      “De novo draft models for strain KPPR1 were built using; i) Bactabolize with the KpSC pan v1 reference; ii) CarveMe, with its universal reference model (CarveMe universal); iii) CarveMe, with KpSC-pan v1 reference (CarveMe KpSC pan); iv) gapseq; and v) KBase (ModelSEED). ….. Subsequently, each model was used to predict growth phenotypes; i) in M9 minimal media with different sole sources of carbon, nitrogen, phosphorus and sulfur; and ii) for all possible single gene knockouts in LB under aerobic conditions. The predicted phenotypes were compared directly to the published phenotype data.” [Note the published data are cited in the previous manuscript sentence, not shown here].

      5) The link https://github.com/rrwick/GFA-dead-end-counter, in line 286 does not work.

      Link regenerated – now at line 451-452 and 604

      Reviewer #2

      1) KpSC pan-metabolic reference model is provided. Are they required as input for Bactabolize? Are the gene, metabolite information open accessible by users? o See response to reviewer 1 point 2 above and;

      All data for the KpSC pan-model described in this work are accessible in the model files and amino acid + nucleotide files + data table at https://github.com/kelwyres/KpSC-pan-metabolic-model. This is also linked in the manuscript at line 631 and in the Data availability statement at line 661.

      2) In the results section "description of Bactabolize", the authors present technical details on how to generate a metabolic model. For the input and output, please provide concrete examples to show the functionality of Bactabolize.

      Detailed instructions, example code and example input/output files are available via the Bactabolize GitHub repository: https://github.com/kelwyres/Bactabolize.<br /> Instructions and example code can be found on the wiki: https://github.com/kelwyres/Bactabolize/wiki Test data and example files are at: https://github.com/kelwyres/Bactabolize/tree/main/data/test_data

      The Github repository is linked in the manuscript at lines 95, 124, 552, and 667, and we have added a further reference at line 124, which mentions the example code/data: “Full documentation, including example code and test data are available at the Bactabolize code repository (https://github.com/kelwyres/Bactabolize).”

      3) To generate metabolic models, the authors present comparison results with other methods. However, the authors only present the numbers in genes, metabolites and substrates. Since the interactions between gene, metabolite, and substrate are also critical, if possible, please provide the coverage details about these interactions. Venn diagram is recommended to compare these coverage differences.

      Two additional supplementary figures have been generated (Figures S5 and 6) showing Venn diagrams of metabolites and reactions for the highthroughput analysis approaches that are most relevant to this work (see also response to rev 1, point 1). These are discussed at lines 224-237:

      “Figures S5 and S6 show the overlaps of metabolites and reactions between the high-throughput reconstruction methods after processing with MetaNetX (59) to standardise the reaction and metabolite nomenclatures (excluding CarveMe pan for simplicity and given the likely problems of reaction oversubscription). The majority of the reactions included in the Bactabolize model were conserved in either the CarveMe universal model (n = 1225, 53.2%), gapseq model (n = 54, 2.3%) or both (n = 665, 28.9%). The reaction overlap was skewed to the CarveMe universal model which shared 1225 reactions that were conserved in the Bactabolize model but absent from the gapseq model. Notably, the gapseq model contained a large number (2200) of unique reactions (70.4% of those in the model). Similarly, the vast majority of metabolites in the Bactabolize model were conserved in one or both of the other models (n = 917, 85.6%). However, it is likely that true overlaps between methods are underrepresented due to the different reaction identifiers and chemical synonyms used within the BiGG (Bactabolize, CarveMe) vs ModelSEED nomenclatures (gapseq), which are difficult to harmonise in an automated manner even after the application of MetaNetX.”

      Figure 2 shows not only the model numbers but also includes benchmarking to real phenotypic data in 2DEFG as the key mode of comparison between models. This encompasses meaningful interactions between gene, metabolic and substrate. The results are discussed at length in text at lines 253-271:

      “We assessed the performance of each model for in silico prediction of growth phenotypes compared to the previously published experimental data (15). Accuracy, sensitivity, specificity, precision and F1 scores were calculated (60). Note that the specific set of growth substrates and gene knockouts that can be simulated is determined by the sets of genes and metabolites captured by each model and is therefore model-dependent (Data S1 and S2). Among those with matched experimental phenotype data, the Bactabolize and CarveMe universal models were able to predict growth for a greater number of carbon, nitrogen, phosphorous and sulfur substrates than gapseq, CarveMe KpSC pan, KBase and iKp1289 models (Figure 2C, Data S1). While the CarveMe universal model had the highest number of truepositive growth predictions overall (n = 132 of 617 total predictions), it also had a comparably high number of false-positive predictions (n = 39 of 617 total predictions, Figure 2D). Similarly, the gapseq and iKp1289 models resulted in 31 (262 total predictions) and 50 (513 total predictions) falsepositive predictions, respectively. In contrast, the Bactabolize model had fewer false-positive predictions (n = 21 of 505 total predictions) alongside a high number of true-positive predictions (n = 117 of 505 total predictions), resulting in the highest overall accuracy metrics (Figure 2E, Data S1). The KBase model was a notable outlier, associated with a high number of falsenegative predictions (n = 31 of 103 total predictions) and low false-positive predictions (n = 3 of 103 total predictions), presumably resulting from the very low number of genes and reactions included in the model, driving low sensitivity and accuracy.”

      Lines 272-280:

      “The gene essentiality results showed that gapseq produced the highest absolute number of true-positive gene essentiality predictions (n = 79 of), followed by Bactabolize KpSC pan (n = 44 of 1220 total predictions), then CarveMe universal (n = 39 of 1951 total predictions). CarveMe universal had the largest number of true-negatives by a wide margin (n = 1599 of 1951 total predictions), followed by gapseq (n = 1085 of 1403 total predictions), then Bactabolize KpSC pan (n = 939 of 1220 total predictions), driving their high accuracies (83.96%, 82.96% and 80.57%, respectively). The Bactabolize model was associated with the greatest overall precision and specificity (Figures 2F & 2G) while the gapseq model resulted in the highest F1-score and sensitivity.”

      4) Are quality control and gap-filling needed to be processed when constructing a new metabolic model?

      Our goal here was to implement an approach to support high-throughput analyses (see response to rev 1 point 1), including leveraging draft genome assemblies as the bases for the construction of strain-specific metabolic models. As part of this work, we have described a robust quality control (QC) framework for screening draft K. pneumoniae genomes i.e. to identify genome assemblies that should not be used. We developed this framework by comparison to models generated for matched completed genomes. Our analyses demonstrate the importance of applying QC to the input draft genome assemblies. When appropriate QC is applied to the input genomes, the resultant draft models show a high degree of completeness compared to the matched models derived from complete genomes. The draft models can also be used to simulate growth phenotypes with high accuracy as compared to those simulated for the matched complete genome models.

      No specific QC was applied to the draft models themselves, other than confirmation of positive growth prediction in m9 minimal media plus glucose (which is expected to support growth of all K. pneumoniae). In cases where the input assembly passed our QC criteria but the resultant model was unable to simulate growth in m9 minimal media plus glucose, gap-filling may be optionally applied. Again, by comparison to the simulated phenotypes from matched complete genome models, we show that these gap-filled draft models can produce accurate phenotype predictions. See lines 396-404:

      “Of the 901 draft genome assemblies which passed our QC criteria (≤200 assembly graph dead ends), 23 of the resulting draft models failed to simulate growth in M9 minimal media with glucose (despite capturing ≥99% of the genes and reactions in the corresponding complete models). It is expected that all KpSC models should be able to simulate growth on M9 media with glucose as a sole carbon source, as this central metabolism is universal amongst KpSC. To replace missing, critical reactions required for growth on M9 with glucose, we investigated model gap-filling using the patch_model command of Bactabolize. We then assessed the accuracy of the gap-filled models for prediction of growth on the full range of substrates, as compared to the predictions from the corresponding complete models.” Lines 409-413: “Substrate usage predictions from the 21 successfully gap-filled models were highly accurate, with 18/21 having a prediction concordance of ≥99% across all 846 growth conditions (12/21 had 100% concordance) (Figure S9). We therefore conclude that models generated for genome assemblies passing our QC criteria, which have been gap-filled to successfully simulate growth on minimal media plus glucose, are suitable for the prediction of growth across a range of substrates.”

      5) Are there any visualization results to check the status of the generated draft model?

      No. This is a tool for large-scale and rapid production of metabolic models, and phenotype prediction and we have not included visualisation tools. Third party tools are available e.g. https://fluxer.umbc.edu/. We do provide optional generation of MEMOTE reports at lines 136-138:

      “Draft genome-scale metabolic models are output in both SMBL v3.1 (41) and JSON formats (one pair of files for each independent strain-specific model), along with an optional MEMOTE quality report (42)”.

      Reviewer #3

      1) The justification and evaluation of the generated models are inadequate and onedimensional. The authors only focus on statistics such as the number of reactions and genes in the models, which does not accurately depict the completeness of the model.

      The reviewer has misunderstood how we have used ‘completeness’ in this manuscript. In the section describing our novel QC framework, we use this term to refer to the relative completeness of draft models generated from draft genome assemblies as compared to curated models generated from complete genome assemblies for the same strains. The latter were considered as the ‘complete’ models for this purpose. We are not referring to any measure of network or metabolic pathway completeness. We specifically refer to gene and reaction capture compared to the ‘complete’ models because these features directly reflect the problem we are trying to address i.e. that draft genome assemblies may not contain the complete set of genes that are truly present in the underlying genome. We have updated the manuscript text to further clarify the problem we aim to address in this section and justify the use of gene and reaction capture metrics:

      Lines 310-319: “There are now thousands of bacterial genomes available in public databases, the majority of which are in draft form, comprising 10s to 1000s of assembly contigs. This fragmentation of the genome is caused by repetitive sequences that cannot be resolved by the assembly algorithm and/or sequence drop-out. The latter can result in the loss of genetic information such that some portion of genes present in the underlying genome are lost from the genome assembly (either completely or partially). This in turn, poses a limitation for the reconstruction of metabolic models using these assemblies, since most published approaches use sequence searches to predict the presence/absence of genes and their associated enzymatic reactions. Therefore, if we are to use public genome data for high-throughput metabolic modelling studies, it is essential to evaluate the expected model accuracies and understand the minimum input genome quality requirements.”

      The biological accuracy of the curated ‘complete’ models has been described previously, and this is now noted in the text at lines 320-324:

      “Here we performed a systematic analysis leveraging our published curated KpSC models (n=37, (14)), which were generated using completed genome sequences and were therefore considered to represent ‘complete’ models for which the underlying genome sequence contains all genes that are truly present in the genome (note the biological accuracy of these models was reported previously (14) and is not the subject of the current study).”

      Throughput the manuscript we not only compare models in terms of the numbers of genes and reactions, but through comparison of binary growth predictions. Specifically, in the Performance Comparison section (Bactabolize vs other approaches) we use comparison of predicted to experimental phenotypes for strain KPPR1 (see response to rev 1 point 4 for details). In the QC Framework section we compare the predictions derived from draft models generated from draft genome assemblies to those derived from the matched ‘complete’ models, and report the concordance as a measure of impact of input assembly quality (lines 309-394). In the final results section (Predictive accuracy of draft models), we generate 10 additional models and compare the growth predictions to matched experimental data (lines 414-433). We view these phenotype prediction comparisons as the ultimate measure of ‘completeness’ with which to assess our models, because these data have direct biological meaning.

      2) The authors have not provided evidence or discussion on the accuracy of any metabolic fluxes, which are considered to be crucial for reconstructing metabolic models. Additionally, the authors have not mentioned the importance of non-growth associated maintenance and the criticality of biomass composition analysis, both of which significantly determine the fluxes in the system.

      We acknowledge the importance of flux calculations and accurate biomass compositions when using genome-scale models to quantitatively predict growth rates. However, at this stage, the reconstructions developed using Bactabolize are intended for binary predictions and comparisons of growth capabilities on various substrates. The accuracies we report are based on measures of network completion (presence/absence of relevant reactions leading to growth or no-growth phenotypes) rather than specific growth rates. Thus, the models generated by Bactabolize can be used to explore diversity at the strain level in terms of growth capabilities and can serve as a scaffold for building detailed (customized biomass), strain-specific models. Measuring biomass composition and metabolic flux analysis require significant experimental comparisons that are outside the scope of the current study but could be performed for target strains based on reconstructions developed using Bactabolize.

      3) It would be interesting to compare the accuracy of the models generated using Bactabolize with those manually curated.

      We did exactly this. We compared the manually curated model iKp1289 as part of our benchmarking. Lines 194 – 199:

      “We compared the output and performance of Bactabolize to the two previously published tools that can support high-throughput analyses i.e. CarveMe (30) and gapseq (31). To aid interpretation in the context of community standard approaches, we also include a comparison to the popular web-based reconstruction tool, KBase (ModelSEED), and a manually curated metabolic reconstruction of K. pneumoniae strain KPPR1 (also known as VK055 and ATCC 43816, metabolic model named iKp1289) (15).”

      Unfortunately, as far as we aware there are currently no other published manually curated models for strains with matched phenotype data that are also not included as part of our pan-reference model (the latter is a key point to ensure a fair comparison of models generated using our pan-reference vs those generated with a universal reference).

      4) The authors have not provided evidence or discussion on the accuracy of any metabolic fluxes, which are considered to be crucial for reconstructing metabolic models.

      See response to rev 3, point 2.

      5) The justification regarding the completeness of the models requires further discussion.

      See response to rev 3, point 1.

      6) A detailed discussion on the importance of manually curated models would significantly enhance the quality of the manuscript.

      This has been added at lines 458-474:

      “Traditionally, genome-scale metabolic reconstruction approaches have relied upon significant manual curation efforts. While there will always remain a need for high quality curated models, such resource intensive approaches preclude their application at scale, and have therefore limited analyses to small numbers of individual strains (15, 16). However, automated reconstruction approaches can support the generation and comparison of multiple strain-specific draft models from which meaningful biological insights can be derived (61). Additionally, the quality of curated models is likely to vary depending on their age, level and type of curation, as well as the approach used for preliminary drafting. Indeed it is possible for automated approaches to outperform manually curated models; a draft model for K. pneumoniae KPPR1 generated using Bactabolize with the KpSC pan-v1 reference model outperformed the manually curated iKp1289 model representing the same strain (15). iKp1289 was published in 2017 (6 years prior to this study) and was initially drafted via the KBase pipeline (33), which uses RAST to annotate the sequences with Enzyme Commission numbers. It has been demonstrated several times that the Enzyme Commission scheme has systematic errors (62, 63), leading to a loss in accuracy when compared to the ortholog identification methods used by automated approaches. Consistent with this assertion, our draft KPPR1 model constructed with KBase (without manual curation) was an outlier in terms of the very low number of genes, reactions and metabolites that were included.”

    1. Author Response

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Recommendations For The Authors):

      1) Overall, the novel phylogenetic analyses presented are satisfactory. With this new piece of information in hand, I would suggest using maximum-likelihood analyses as the major evidence supporting ortholog annotations. In fact, it would be best advised to add the bootstrap support analyses (perhaps over new trees) to the phylogenies presented in the supplement.

      Thank you for suggestion. Although it would make sense to present phylogenetic trees constructed by maximum-likelihood analyses, we decided to keep the original trees (for CDCA7 and HELLS) in supplemental figures for an aesthetic reason. For example, for CDCA7/zf-4CXXC_R tree made by maximum likelihood method *Hif2_data2_zf4CXXC_R1_iqtree.txt), it would have been easier to visualize if the plant CDCA7 clade was positioned at the bottom, not the top, of the tree, as the topology was identical in both cases. Unfortunately, as the calculated result randomly put plant CDCA7 clade at the top, plant CDCA7 clade appears to be separated from the clades representing the rest of CDCA7 homologs. While we could manually adjust this in the final drawing, we wanted to avoid that.

      2) There are still a few places in the main text where RBH - and is associated E-value - is used as evidence of orthology. As mentioned in my original review, this is evidence for homology, not orthology. Please make sure to amend the final text (for example in the first paragraph of the result section).

      We concurred and amended the manuscript following this recommendation.

      3) We agree with reviewer 1 that part of the functional considerations outside of the human and frog example should be softened, or clearly labelled as an hypothesis - which is now supported by this interesting study

      I assume that this is related to Introduction of CDCA7. As this study defined CDCA7 homologs in result section We believe that this point has been addressed in our last revision.

      4) In addition, make sure to indicate in the main text state the point about DNMT3 nomenclature (w.r.t. DRM).

      In page 10, we added a sentence below to clarify this point.

      “In this report, we call a protein DNMT3 if it clusters into the clade including metazoan DNMT3, plant DNMT3, and DRM.”

    2. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important manuscript reveals signatures of co-evolution of two nucleosome remodeling factors, Lsh/HELLS and CDCA7, which are involved in the regulation of eukaryotic DNA methylation. The results suggest that the roles for the two factors in DNA methylation maintenance pathways can be traced back to the last eukaryotic common ancestor and that the CDC7A-HELLS-DNMT axis shaped the evolutionary retention of DNA methylation in eukaryotes. The evolutionary analyses are solid, although more refined phylogenetic approaches could have strengthened some of the claims. Overall, this study should be useful for researchers studying DNA methylation pathways in different organisms, and it should be of general interest to colleagues in the fields of evolutionary biology, chromatin biology and genome biology.

      We sincerely appreciate constructive comments and suggestions by the reviewers and a fair and accurate summary by the monitoring editor. Below we made point-by-point responses to reviewers’ comments.

      Reviewer #1 (Public Review):

      Overall, I find the work performed by the authors very interesting. However, the authors have not always included literature that seems relevant to their study. For instance, I do not understand why two papers Dunican et al 2013 and Dunican et al 2015, which provide important insight into Lsh/HELLS function in mouse, frog and fish were not cited. It is also important that the authors are specific about what is known and in particular about what is not known about CDCA7 function in DNA methylation regulation. Unless I am mistaken, there is currently only one study (Velasco et al 2018) investigating the effect of CDCA7 disruption on DNA methylation levels (in ICF3 patient lymphoblastoid cell lines) on a genome-wide scale (Illumina 450K arrays). Unoki et al 2019 report that CDCA7 and HELLS gene knockout in human HEK293T cells moderately and extremely reduces DNA methylation levels at pericentromeric satellite-2 and centromeric alpha-satellite repeats, respectively. No other loci were investigated, and it is therefore not known whether a CDCA7-associated maintenance methylation phenotype extends beyond (peri)centromeric satellites. Thijssen et al performed siRNA- mediated knockdown experiments in mouse embryonic fibroblasts (differentiated cells) and showed that lower levels of Zbtb24, Cdca7 and Hells protein correlate with reduced minor satellite repeat methylation, thereby implicating these factors in mouse minor satellite repeat DNA methylation maintenance. Furthermore, studies that demonstrate a HELLS-CDCA7 interaction are currently limited to Xenopus egg extract (Jenness et al 2018) and the human HEK293 cell line (Unoki et al 2019). Whether such an interaction exists in any other organism and is of relevance to DNA methylation mechanisms remains to be determined. Therefore, in my opinion, the conclusion that "Our co- evolution analysis suggests that DNA methylation-related functionalities of CDCA7 and HELLS are inherited from LECA" should be softened, as the evidence for this scenario is not very compelling and seems premature in the absence of molecular data from more species.

      We appreciate this reviewer’s thorough reading of our manuscript.

      Regarding the citation issues, we will cite Dunican 2013 and Dunican 2015. In addition, we went through the manuscript to update the citations.

      As pointed out by the reviewer, the role of CDCA7 in genome DNA methylation was extensively studied in Velasco et al 2018. The result, together with Thijssen et al (2015), and Unoki et al. (2018), supports the idea that ZBTB24, CDCA7 and HELLS act within the same pathway to promote DNA methylation, the pattern of which is overlapping but distinct from DNMT3B-mediated methylation. This observation suggests that a ZBTB24- CDCA7-HELLS mechanism for DNA methylation may involve an alternative DNMT. Interestingly, our analysis of the gene presence-absence pattern revealed that the presence of CDCA7 coincides with DNMT1 more than DNMT3 genes. Indeed, while CDCA7 is lost from diverse branches of eukaryote species, genomes encoding CDCA7 always encode HELLS, and almost always encode DNMT1. Based on this observation, we speculate the role of CDCA7 is tightly linked to HELLS and DNA methylation throughout evolution.

      As pointed out by Reviewer 1, the link between CDCA7, HELLS and DNA methylation has not been determined experimentally across these species. However, based on our previously published and unpublished data, we are confident about the functional interaction between CDCA7 and HELLS in Xenopus laevis and Homo sapiens.

      Furthermore, the importance of HELLS homologs in DNA methylation has been extensively studied in human, mice and plants. We hope our current study will motivate the field to experimentally test the evolutionary conservation of HELLS-CDCA7 interaction, as well as their importance in DNA methylation, in other species.

      The authors used BLAST searches to characterize the evolutionary conservation of CDCA7 family proteins in vertebrates. From Figure 2A, it seems that they identify a LEDGF binding motif in CDCA7/JPO1. Is this correct and if yes, could you please elaborate and show this result? This is interesting and important to clarify because previous literature (Tesina et al 2015) reports a LEDGF binding motif only in CDCA7L/JPO2.

      We searched for a LEDGF binding motif ({E/D}-X-E-X-F-X-G-F, also known as IBM described in Tesina et al 2015) in vertebrate CDCA7 proteins, and reported their positions in Figure 2A. Examples of identified LEDGF-binding motifs are now presented in Fig. 2C.

      To provide evidence for a potential evolutionary co-selection of CDCA7, HELLS and the DNA methyltransferases (DNMTs) the authors performed CoPAP analysis. Throughout the manuscript, it is unclear to me what the authors mean when referring to "DNMT3". In the Material and Methods section, the authors mention that human DNMT3A was used in BLAST searches to identify proteins with DNA methyltransferase domains. Does this mean that "DNMT3" should be DNMT3A? And if yes, should "DNMT3" be corrected to "DNMT3A"? Is there a reason that "DNMT3A" was chosen for the BLAST searches?

      As described in the Methods section, both Human DNMT1 and DNMT3A were used to initially identify any proteins containing a domain homologous to the DNA methyltransferase catalytic domain. Within Metazoa, if their orthologs exist, the top hit from BLAST search using human DNMT1 and DNMT3A show E-value 0.0, and thus their orthology is robust. This is even true for DNMT1 and DNMT3 homologs in the sponge Amphimedon queenslandica, which is one of the earliest-branching metazoan species. For other DNMTs, such as DNMT2, DNMT4, DNMT5, DNMT6, we conducted separate BLAST searches using those proteins as baits as described in Methods. The methyltransferase domain was then isolated using the NCBI conserved domains search. The selected DNMT domain sequences were aligned with CLUSTALW to generate a phylogenetic tree to further classify DNMTs. In response to reviewer #2’s comments, we also generated another multi-sequence alignment of DNMTs using MUSCLE v5 and conducted maximum-likelihood-based phylogenetic tree assembly using IQ-TREE (new Fig. S6). The overall topology of these trees is consistent except for orphan DNMTs. It has been suggested that vertebrate DNMT3A and DNMT3B are derived from duplication of a DNMT3 gene of chordates ancestor (e.g., Liu et al 2020, PMID 31969623). As such many invertebrates encode only one DNMT3. As previously shown (Yaari et al., 2019, PMID 30962443), plants have two distinct DNMT3-like protein family, the ‘true DNMT3’ and DRM, the plant specific de novo DNMT that is often considered to be a DNMT3 homolog (see Reviewer 2’s comment). Our phylogenetic analysis successfully deviated the clade of DNMT3 and DRM from the rest of DNMTs (Figure S6). Yaari et al noted that PpDNMT3a and PpDNMT3b, the two DNMT3 orthologs encoded by the basal plant Physcomitrella patens, are not orthologs of mammalian DNMT3A and DNMT3B, respectively. Therefore, to minimize such nomenclature confusions, any DNMTs that belong to either the DNMT3 or DRM clades indicated in Figure S6 are collectively referred to as ‘DNMT3’ throughout the paper (see Figure S2 for overview).

      CoPAP analysis revealed that CDCA7 and HELLS are dynamically lost in the Hymenoptera clade and either co-occurs with DNMT3 or DNMT1/UHRF1 loss, which seems important. Unfortunately, the authors do not provide sufficient information in their figures or supplementary data about what is already known regarding DNA methylation levels in the different Hymenoptera species to further consider a potential impact of this observation. What is "the DNA methylation status" of all these organisms? This information cannot be easily retrieved from Table S2. A clearer presentation of what is actually known already would improve this paragraph.

      As the DNA methylation status of the species in the Hymenoptera clade has not been comprehensively tested, we initially did not include this information to Figure 7. However, during the course of the revision, we realized that Bewick et al.2017 (PMID 28025279) reported that DNA methylation is absent from the braconid wasp Aphidius ervi. We originally conducted synteny analysis on Aphidius gifuensis, which has a chromosome-level genome assembly with annotated proteins available in NCBI, whereas annotated proteins for Aphidius ervi protein are not available in NCBI. By conducting tBLASTn search against the Aphidius ervi genome, we now found that the presence/absence pattern of CDCA7, HELLS, DNMT1, DNMT3 and UHRF1 in Aphidius ervi is identical to that of Aphidius gifuensis, with a caveat that genome assembly of Aphidius ervi is at scaffold-level. In other words, DNA methylation, DNMT1 and CDCA7 are absent in Aphidius ervi, where 5mC is undetectable. Additionally, we also realized that the DNA methylation status reported for some species in Bewick et al. 2017 was inferred from the CpG frequency instead of the direct experimental detection of methylated cytosines. Therefore, we have amended Table S3 to indicate the presence of 5mC only for those species where this was experimentally tested. As such, we now consider the DNA methylation status of Fopius arisanus, which lacks DNMT1 and CDCA7, to be unknown.

      Altogether, among the 17 Hymenoptera species that we analyzed (listed in the amended Table S3), the 8 species that have detectable DNA methylation all encode CDCA7, whereas the 2 species that do not have detectable DNA methylation lack CDCA7. We will note this finding in the revised text, and include the known 5mC status in the new Figure 7.

      Furthermore, A. thaliana DDM1, and mouse and human Lsh/Hells are known to preferably promote DNA methylation at satellite repeats, transposable elements and repetitive regions of the genome. On the other hand, DNA methylation in insects and other invertebrates occurs in genic rather than intergenic regions and transposable elements (e.g. Bewick et al 2017; Werren JH PlosGenetics 2013). It would be helpful to elaborate on these differences.

      We were aware of this interesting point, which was discussed in the third paragraph of the Discussion. To better illustrate this point, we now expanded the Discussion (page 14) to speculate about the role of DNA methylation in insects, where emerging evidence indicates the importance of DNMT1 in meiosis. It should be noted that, in the Arabidopsis ddm1 mutant, reduction of CG methylation of gene bodies is common (50% of all methylated euchromatic genes) (Zemach et al, 2013). In addition, hypomethylation is not limited to satellite repeats and transposable elements in ICF patients defective in HELLS or CDCA7 (Velasco et al., 2018).

      Reviewer #2 (Public Review):

      In this manuscript, Funabiki and colleagues investigated the co-evolution of DNA methylation and nucleosome remolding in eukaryotes. This study is motivated by several observations: (1) despite being ancestrally derived, many eukaryotes lost DNA methylation and/or DNA methyltransferases; (2) over many genomic loci, the establishment and maintenance of DNA methylation relies on a conserved nucleosome remodeling complex composed of CDCA7 and HELLS; (3) it remains unknown if/how this functional link influenced the evolution of DNA methylation. The authors hypothesize that if CDCA7-HELLS function was required for DNA methylation in the last eukaryote common ancestor, this should be accompanied by signatures of co-evolution during eukaryote radiation.

      To test this hypothesis, they first set out to investigate the presence/absence of putative functional orthologs of CDCA7, HELLS and DNMTs across major eukaryotic clades. They succeed in identifying homologs of these genes in all clades spanning 180 species. To annotate putative functional orthologs, they use similarity over key functional domains and residues such as ICF related mutations for CDCA7 and SNF2 domains for HELLS. Using established eukaryote phylogenies, the authors conclude that the CDCA7-HELLS-DNMT axis arose in the last common ancestor to all eukaryotes. Importantly, they found recurrent loss events of CDCA7-HELLS-DNMT in at least 40 eukaryotic species, most of them lacking DNA methylation.

      Having identified these factors, they successfully identify signatures of co-evolution between DNMTs, CDCA7 and HELLS using CoPAP analysis - a probabilistic model inferring the likelihood of interactions between genes given a set of presence/absence patterns. As a control, such interactions are not detected with other remodelers or chromatin modifying pathways also found across eukaryotes. Expanding on this analysis, the authors found that CDCA7 was more likely to be lost in species without DNA methylation.

      In conclusion, the authors suggest that the CDCA7-HELLS-DNMT axis is ancestral in eukaryotes and raise the hypothesis that CDCA7 becomes quickly dispensable upon the loss of DNA methylation and/or that CDCA7 might be the first step toward the switch from DNA methylation-based genome regulation to other modes.

      The data and analyses reported are significant and solid. However, using more refined phylogenetic approaches could have strengthened the orthologous relationships presented. Overall, this work is a conceptual advance in our understanding of the evolutionary coupling between nucleosome remolding and DNA methylation. It also provides a useful resource to study the early origins of DNA methylation related molecular process. Finally, it brings forward the interesting hypothesis that since eukaryotes are faced with the challenge of performing DNA methylation in the context of nucleosome packed DNA, loosing factors such as CDCA7-HELLS likely led to recurrent innovations in chromatin-based genome regulation.

      Strengths:

      • The hypothesis linking nucleosome remodeling and the evolution of DNA methylation.

      • Deep mapping of DNA methylation related process in eukaryotes.

      • Identification and evolutionary trajectories of novel homologs/orthologs of CDCA7.

      • Identification of CDCA7-HELLS-DNMT co-evolution across eukaryotes.

      Weaknesses:

      • Orthology assignment based on protein similarity.

      • No statistical support for the topologies of gene/proteins trees (figure S1, S3, S4, S6) which could have strengthened the hypothesis of shared ancestry.

      We appreciate the reviewers’ accurate summary, nicely emphasizing the importance of the our study. We agree that better phylogenetic analysis for orthology assignment will strengthen our conclusion. Having anticipated this weakness, however, we specifically conducted a CoPAP analysis exclusively for Ecdysozoa specieswhich supported our major conclusion, as orthology assignment is straightforward in these species. For example, if we conduct BLAST search against the clonal raider ant Oocerea biroi protein dataset using human HELLS as a query, top 1 hit is a protein sequence annotated as one of three isoforms of ‘lymphoid-specific helicase” (i.e., HELLS), with E value 0.0. Similarly, the top BLAST hit from the Oocerea biroi dataset using human DNMT1 as a query also returns with isoforms of DNMT1 with E value 0.0. As such, there are little disputes in orthology assignment in Ecdysozoa. Outside of Chordata, classification of DNMTs, particularly in Excavata and SAR, require more extensive identification in these supergroups. Our current orthology assignment for the major targets in this study (HELLS, DNMT1, DNMT3, DNMT5) is largely consistent with published results (Ponger et al., 2005 PMID 15689527; Huff et al, 2014 PMID 24630728; Yaari et al., 2019 PMID 30962443; Bewick et al., 2019 PMID 30778188). However, while we are preparing this response and re-crosschecking our assignments with these references, we realized that we had erroneously missed DNMT5 orthologs in Leucosporidium creatinivorum, Postia placenta, Armillaria gallica and Saitoella complicata, and a DNMT6 ortholog in Fragilariopsis cylindrus. We also recognized that DNMT4 orthologs were identified in Fragilariopsis cylindrus and Thalassiosira pseudonana in Huff et al 2014 (PMID 24630728), but in our phylogenetic analysis, these proteins form a distinct clade between DNMT1/Dim-2 and DNMT4 (original Figure S6), although the confidence level of this classification by Huff et al was not strong. To resolve this potential confusion in DNMT annotations, we generated new multiple sequence alignments with MUSCLE v5 and IQ-TREE 2 (maximum likelihood-based method, coupled with selection of optimal substitution model and bootstrapping). The tree topology was not significantly altered between the two methods, except for the unambiguous location of orphan DNMTs and DNMT4-related proteins. To avoid unnecessary confusion in the DNMT annotations, we decided to present MUSCLE-IQ- TREE for the DNMT phylogenetic tree and classification (new Fig. S6). The raw results of IQ-TREE analysis for CDCA7/zf-4CXXC_R1, HELLS SNF2 domain, and DNMTs are included as Dataset S1-S3. We then conducted CoPAP analysis using the corrected classification. As it is not clear a priori if fungal specific CDCA7-like proteins (now referred to as CDCA7F with class II zf-4CXXC_R1) should be considered CDCA7 orthologs, we conducted CoPAP against two lists; the first list includes CDCA7F in the CDCA7 group, whereas the second list includes a separate category of class II zn-4CXXC_R1, which includes CDCA7F. Both results show slightly different topology in the coevolutionary linkages but support our major conclusion that CDCA7 coevolved with DNMT1-UHRF1 and HELLS. These new CoPAP results are shown in Fig. S7.

      Reviewer #1 (Recommendations For The Authors):

      Summary

      Last sentence: "...a unique specialized role of CDCA7 in HELLS-dependent DNA methylation maintenance...". What do the authors mean?

      Our analysis strongly indicates that CDCA7 is dispensable in systems lacking HELLS and DNMT (particularly DNMT1). In other words, species preserve CDCA7 only if it has both HELLS and DNMT1 (or in some cases DNMT5). The importance of HELLS homologs in DNA methylation has been extensively studied in human, mouse and plants. However, in these studies, substantial DNA methylation remains despite the defective HELLS/DDM1 (especially in euchromatic regions). Additionally, there are species (e.g., Bombyx mori) that have DNMT1 and detectable DNA methylation but lacks HELLS and CDCA7. These observations suggest that the role of CDCA7 must be unique and specialized in a way that it is strongly coupled to HELLS-dependent DNA methylation (but not HELLS-independent DNA methylation), and that this function of CDCA7 seems to be inherited from the last eukaryotic common ancestor.

      Introduction

      • page 3: "DNMTs are largely subdivided into maintenance and de novo DNMTs" - Which species are the authors referring to?

      As described in the cited reference (Lyko 2018), maintenance DNA methylation and de novo DNA methylation are well accepted functional classification of DNA methylation. It is also currently accepted that distinct DNMTs execute maintenance DNA methylation or de novo DNA methylation, although crosstalk between these processes has been reported. Therefore, we stated, “DNMTs are largely subdivided into maintenance DNMTs and de novo DNMTs”, and this subdivision is species independent.

      • page 3" "Maintenance DNMTs recognize hemimethylated CpGs. " - Can the authors please define the species and/or literature they are referring to? This seems important to clarify. For instance, mammalian DNMT1 requires a co-factor, UHRF1, which recognizes hemimethylated DNA and H3K9me3 (Bostick et al 2007).

      We meant to describe, “Maintenance DNMTs directly or indirectly recognize hemimethylated CpGs…”. The specific requirement of UHRF1 for DNMT1-mediated maintenance DNA methylation is explained in the subsequent sentence “In animals…”. In the case of Cryptococcus neoformans, DNMT5 recognizes hemimethylated DNA independently of UHRF1 in vitro to execute maintenance methylation.

      • page 3: The authors may want to mention that A. thaliana also has a de novo DNA methyltransferase, DRM2, a homolog of the mammalian DNMT3 methyltransferases. This seems important, since they show in Figure 1 that a de novo methyltransferase is found in A. thaliana. Also, later in their manuscript they mention plant de novo DNA methylation.

      Thanks for pointing this out. As shown in Figure 5, we classified plant DRMs as DNMT3-like proteins, but we now note this in the Introduction.

      • page 3: Sentence starting "In about 50% of ICF patients,..." - Why is DNMT3B referred to as "de novo", is it not a de novo DNA methyltransferase?

      You are correct. Quotation marks are now removed to avoid unnecessary confusion.

      • page 4: Sentence starting "Indeed, the importance of HELLS/CDCA7 in DNA methylation maintenance...", - Which references (Han et al., 2020; Ming et al., 2021; Unoki, 2021; Unoki et al., 2020) provide experimental evidence for a role of CDCA7 in DNA methylation maintenance by DNMT1?

      Thanks for pointing out the typo. “/CDCA7” is now removed.

      • page 5: Sentence starting "Indeed, it has been shown that DNMT3A..." - Should DNMTB be DNMT3B?

      Yes. This is now corrected.

      Results

      • Page 5: Sentence starting "However, we identified a protein..." - No A. thaliana reference?

      We added Zemach et al 2010, and Chan et al 2005.

      • Figure 2B: "ICF4 mutations" should this be "ICF3 mutations"?

      • Figure 3: "ICF4 mutations" should this be "ICF3 mutations"?

      • Figure 4: "ICF4 mutations" should this be "ICF3 mutations"?

      • Figure S1: Orange colored "CDC7L (fish), CDC7e, CDC7, CDC7L" is there an "A" missing?

      • Figure S5: "ICF4 mutations" should this be "ICF3 mutations"?

      These typos are now corrected. Thank you.

      • Figure S7: What is "CDCA7(II)" referring to, "zf-4CXXC_R1 class II (plants)"?

      The original CDCA7 (II) included proteins with class II zf-4CXXC_R1, which are found in plants, fungi, Acanthamoeba castellanii and Amphimedon. Among those species, the prototypical CDCA7 orthologs are absent only in fungi. It has been a priori unclear if fungal proteins with class II zf-4CXXC_R1 (now we term CDCA7F) should be included in CDCA7 for CoPAP analysis. Although we originally included CDCA7F in CDCA7, we now show the results of two analyses. In the first one (Fig. S7A) CDCA7F was included in CDCA7, whereas in in the second one (Fig. S7B) CDCA7F was included in the separate category of class II zf-4CXXC_R1. Topologies of two results are slightly different, but they both show coevolutionary linkage between the CDCA7 and DNMT1- UHRF1 cluster.

      • Figure 4 and 5: In the case of preliminary genome assemblies what is the difference between empty squares with dotted lines and filled squares without dotted lines?

      As it is difficult to be certain of a gene’s absence (did the species lose the gene or is it simply not annotated due to incomplete genome coverage?), we illustrated the absence of a gene in preliminary genome assemblies with an empty square with dotted outline. Since the presence of a gene is evident regardless of the level of genome assembly, the presence of a gene is represented with filled squares with solid lines, even for preliminary genome assemblies.

      • Figure 1: Why was Mus musculus - one of the main model organisms used for many DNA methylation studies not included? Also what are empty and filled squares?

      Filled and empty squares indicate the presence and absence of the indicated genes, respectively. Clarifying statement is now added in the figure legends. Mus musculus is now included in the figure.

      • Figure S2: Adding the existence of DNA methylation and DNMT3 in the bottom right part of the figure (overall no of species) would make this panel more informative

      We included this overview to summarize the co-retention of CDCA7, HELLS and maintenance DNMTs across the analyzed species. We decided not to include DNA methylation, since DNA methylation status is known for only a fraction of the listed species. Inclusion of DNMT3 will introduce too many possible gene presence-absence combinations to convey a clear message. However, we now mention in the revised text (page 11, second paragraph) that unlike the prevalent co-retention of DNMT1 in species with CDCA7, we identified several species that possess CDCA7, HELLS and DNMT1 but lack DNMT3. These examples include insects such as the bed bug Cimex lectularius and the red paper wasp Polistes canadensis.

      • Page 6: Sentence starting "This leucine zipper sequence is highly conserved..." - Figure/Reference missing?

      The sequence alignment of the leucine zipper is now shown in Fig. 2C.

      • page 6: Sentence starting "In contrast to zf-4CXXC_R1 motif-containing proteins..." - The authors may want to mention the role of the CXXC zf domain in KDM2A/B, DNMT1, MLL1/2 and TET1/3 and what the CDCA7 CXXC zf domain is/could be required for.

      The notion that zf-CXXC binds to nonmethylated CpG is now included. Due to the substantial difference between zf-CXXC and zf-4CXXC_R1, we hesitated to relate the function of zf-4CXXC_R1 with zf-CXXC, but we now discuss a potential role of zf- 4CXXC_R1 in sensing DNA methylation status in Discussion (Page 13).

      • page 7: Sentence starting "Second, the fifth cysteine is replaced..."- Zoopagomycota" - Figure 4A does not have this labeling, one has to deduce this from Figure 4B.

      We fixed this by including the list of Zoopagomycota species in the main text.

      • page 7: Sentence containing "Neurospora crassa DMM-1 does not directly regulate DNA methylation or demethylation but rather..." - How does the information about DMM- 1 relate to what is shown in Figure 4B, to CDCA7, HELLS and DNMTs? Please clarify.

      Both Neurospora DMM-1 and Arabidopsis IBM1 contain the JmjC domain and are implicated in an indirect control mechanism of DNA methylation. Since it has never been pointed out that they have a divergent zf-4CXXC_R1 domain, which clearly shares the origin with CDCA7 proteins, we thought that this is important to note. We realized that we did not clearly mark Neurospora XP-956257 as DMM-1 in Fig. 4B. This is now fixed.

      • Heading "Systematic identification of CDCA7, HELLS and DNMT homologs in eukaryotes". When mentioning CDCA7 the authors may want to decide on the use of one consistent definition of "prototypical (Class I) CDCA7-like proteins (i.e. CDCA7 orthologs)" "Class I CDCA7 proteins". Constantly changing the way how they refer to these proteins is very confusing.

      We now make it clear that we call proteins with class I zf-CXXC_R1 motif CDCA7 orthologs. We also define class II zf-4CXXC_R1 (as those with a substitution at ICF- associated glycine residue). Since no clear CDCA7 orthologs can be found in fungi, we now call fungi proteins with class II zf-4CXXC_R1 “CDCA7F”, implying its ambiguous orthology assignment.

      Under this heading there is also no mention of DNMTs. Instead, the authors introduce DNMTs under the heading "Classification of DNMTs in eukaryotes" - Please clarify.

      This is now corrected.

      • page 9: Sentence containing "... presence of DNMT1, UHRF1 and CDCA7 outside of Viridiplantae and Opisthokonta is rare". What does "rare" mean? How is UHRF1 relevant here?

      Among the 32 species outside of Viridiplantae and Opisthokonta, only the Acanthamoeba castellanii genome encodes clear orthologs of DNMT1, UHRF1 and CDCA7. Although it is often difficult to deduce if the selected panel of species is a reasonable representation, we think that it is not unreasonable to state that Acanthamoeba is a rare case to encode this set of proteins outside of Viridiplantae and Opisthokonta. We include UHRF1 since it is a well-established activator of DNMT1, and indeed our CoPAP analysis showed a tight coevolution of UHRF1 with DNMT1. Outside of Viridiplantae and Opisthokonta, only Acanthamoeba castellanii and Naegleria gruberi encode UHRF1. Interestingly, these two species also encode CDCA7 and HELLS.

      Having said that, we rephrased this sentence, which reads; “Species that encode a set of DNMT1, UHRF1, CDCA7 and HELLS are particularly enriched in Viridiplantae and Metazoa.”

      • page 11: Sentence containing "..., that the function of CDCA7-like proteins is strongly linked to HELLS and DNMT1,..." What do the authors mean with "the function of CDCA7-like proteins"? And what happened to DNMT3?

      Our observation that almost all species that contain CDCA7 (including fungal CDCA7F) also have DNMT1 and HELLS, despite the frequent loss of these genes in species that do not contain CDCA7, indicates “that the function of CDCA7-like proteins is strongly linked to HELLS and DNMT1”. We found only 2 species that possesses CDCA7 (class I or class II) but not DNMT1 among the panel of 180 species. These 2 exceptional species, Naegleria gruberi and Taphrina deformans, do encode UHRF1-like proteins and a DNMT (an orphan DNMT in N. gruberi and DNMT4 in T. deformans). In contrast, we found 26 species that possess CDCA7 (or CDCA7F) but not DNMT3 (Table S1), so the linkage between CDCA7 and DNMT3 is weaker.

      • page 11: Sentence containing "..., CDCA7 is lost from this gene cluster in parasitoid wasps, including Ichneumonoidea wasps and chalcid wasps". This sentence is confusing because already in an earlier paragraph the authors say that "Microplitis demolitor lost CDCA7" and in the following sentence they say "...among Ichneumonoidea wasps, CDCA7 appears to be lost in the Braconidae clade, ...". It would greatly help this reader if the authors could streamline these sentences and also decide on whether CDCA7 is lost in M. demolitor or CDCA7 appears to be lost in M.demolitor.

      The confusion was in part due to the difficulty in differentiating between the true loss of a gene versus its apparent absence in a species due to an incomplete genome assembly, including for of M. demolitor. To verify that the loss of CDCA7 was not due to gaps in the genome assembly, we executed the synteny analysis. However, we edited this section to improve the readability (Page 12-13).

      What could be the role for HELLS/CDCA7 in insect DNA methylation? In several cases, the authors analyses reveal co-evolutionary links between DNMT3 (DNMT3A?) and CDCA7/HELLS. I do not understand why this finding is not really discussed by the authors. Instead there is a strong focus on replication-uncoupled DNA methylation maintenance. Could the authors elaborate why?

      The role of DNA methylation in insects is largely unclear, so discussion must be highly speculative. A recent finding in the clonal raider ant, showing that DNMT1 is not essential for development but is critical for oogenesis, pointed toward a possible more universal role of DNA methylation in meiosis. Stimulated from a finding in Neurospora, where DNA methylation is required for homolog pairing during meiosis, we discuss a speculative model that DNA methylation status acts as a hallmark to distinguish between healthy/young DNA and old/mutated (or competitive/pathogenic) DNA at homolog pairing during meiosis (page 14).

      Regarding the cases where CDCA7 and DNMT3 are co-lost, we had discussed about this phenomenon at the last section of Result, stating, “This co-loss of CDCA7 and DNA methylation (together with either DNMT1-UHRF1or DNMT3) in braconid wasps suggests that evolutionary preservation of CDCA7 is more sensitive to DNA methylation status per se than to the presence or absence of a particular DNMT subtype.” Please note that we found several lineages that lacks CDCA7 but has DNMT1 (and DNMT3), whereas almost all species that has CDCA7 also has DNMT1 (but not necessarily DNMT3). Supported with our CoPAP analyses, our results indicate the tight functional link between CDCA7 and DNMT1, but it does not necessarily mean that CDCA7 does not play any role related to DNMT3 or de novo methylation. Clarification of this point and our speculation of how CDCA7 loss is linked to reduced requirement of DNA methylation are discussed in page 13 and 14 with additional texts.

      Discussion

      • page 12: Where is the data supporting. "... the red flour beetle Tribolium castaneum possesses DNMT1 and HELLS, but lost DNMT3 and CDCA7"?

      Figure 5, Figure S2 and Table S1. This is now noted in the text.

      • page 14: Based on which parts of their analyses or evidence from the literature can the authors speculate that "...the evolutionary arrival of HELLS-CDCA7 in eukaryotes might have been required to transmit the original immunity-related role of DNA methylation from prokaryotes to nucleosome-containing (eukaryotic) genomes"? Please clarify.

      This is inferred from the well-known role of DNA methylation in bacteria for defending against phage viruses. However, it was not correct to state that such a function was inherited from prokaryotes. It should be stated that it was inherited from the last universal common ancestor (LUCA). We also admit that it is not clear if such an immunity-related role was inherited from LUCA, or if it emerged through convergent evolution. Therefore, we amended this description to emphasize our hypothesis that the advent of CDCA7 was “a key step to transmit the DNA methylation system from the LUCA to the eukaryotic ancestor with nucleosome-containing genomes”.

      Supplementary Figures/Tables

      • page 26: Table S2 and Table S3, it seems that these tables show data that supports what is shown in Figure 7 and not Figure 5.

      You are correct. Thank you for pointing out the typos.

      Has the methylation status been assessed in C. glomerata, C. typhae, Chelonus insularis, Diachasma alloeum or Aphidius gifuensis? Please clarify in Table S2.

      Not to our knowledge. However, as we realized that absence of DNA methylation in Aphidius ervi was previously reported (Bewick et al 2017), we now included this data together with presence/absence analysis of DNMT1, UHRF1, DNMT3, CDCA7 and HELLS. Known presence/absence of DNA methylation is now shown in Fig.7.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation to strengthen the paper:

      1) Phylogenetics:

      • Test and report the appropriateness of the substitution model used in protein alignments/trees.

      • Use Maximum likelihood methods and/or MCM Bayesian inference to build and report trees with well supported topologies. This is required to properly assign orthology (shared ancestry). This will avoid false interpretation due to technical limitation of similarity-based phylogenies (without statistical support). Figure S1, S3, S4 and S6.

      To address these points, we made new multisequence alignments using MUSCLE v6 and generated phylogenetic trees using the maximum likelihood-based IQ-TREE 2, where multiple models were screened. A consensus tree was generated after 1000 bootstrap replicates from the best alignment and model. The topology and assignment of these new trees were largely consistent with the original trees, except for some corrections in DNMT assignment as discussed below.

      1. We realized that we erroneously missed DNMT5 orthologs of Leucosporidium creatinivorum, Postia placenta, Armillaria gallica and Saitoella complicata., and DNMT6 orthologs from Fragilariopsis cylindrus reported in Huff et al 2014 (PMID 24630728). They are now included in the new list and CoPAP analysis.

      2. DNMT4 orthologs were identified in Fragilariopsis cylindrus and Thalassiosira pseudonana by Huff et al 2014 (PMID 24630728), but in our original phylogenetic analysis, these proteins form a distinct clade between DNMT1/Dim-2 and DNMT4. The new tree and classification are more consistent with Huff et al, so we present the new tree in Fig. S6 and conducted the classification based on this tree.

      Beside Fig. S6, we decided to maintain original Fig. S1, S3 and S4 (with a few adjustments) for better visibility, but we included the results of IQ-TREE analysis as Dataset S1-S3.

      The CoPAP analysis based on the revised assignment slightly changed the topology of coevolutionary linkages. In addition, we obtained a slightly different result depending on whether fungal specific CDCA7 with class II zn-4CXXC_R1 (now referred to as CDCA7F) is included as a CDCA7 ortholog or not. Despite this difference, we reproducibly observed the coevolutionary linkage between CDCA7 and DNMT1- UHRF1.

      • Be more careful with wording: RBH is not sufficient to call gene/proteins orthologs (e.g. Page 8). The above mentioned method will help you support this claim (+ synteny when you can).

      We were aware of this issue. This is why we conducted phylogenetic tree building based on sequence alignment of full-length HELLS (Fig. S3) and SNF2 domain only (Fig. S4), as explained in the text. We found that the RBH criterion is robust in Metazoa; orthologs are easily recognizable with very low E-value (0.0) and extensive homology over the full length of the protein, while synteny is not practical to employ in the diverse set of species.

      • Also, use "co-retention" or "co-evolution" but not "co-selection" when describing CoPAP results - as CoPAP does not test for signature of natural selection.

      This is a good point and is now corrected.

      • The statistics (p-val...) underlying the CoPAP analyses should be explained.

      The explanation is now added in Methods section.

      “A method to calculate p-value for CoPAP was described previously (Cohen et al., 2012, PMID 22962457). Briefly, for each pair of tested genes, Pearson's correlation coefficient was computed. Parametric bootstrapping was used to compute a p-value by comparing it with a simulated correlation coefficient calculated based on a null distribution of independently evolving pairs with a comparable exchangeability (a value reporting the likelihood of gene gain and loss events across the tree).”

      2) Figure S2 and S3 could be improved for readability

      After consideration of this criticism, we decided to keep their original formats for following reasons.

      Figure S2. The purpose of this list is to better visualize the comprehensive list shown in Table S2. A consolidated list is already shown in Figure 5. An alternative choice is to make a diagram where individual species names are unreadable. This kind of presentation is seen in many published papers, but we found that they are not helpful to check the details. As this is a supplementary figure, we prefer to show the detailed data that can be visible without a specialized software.

      Figure S3. This figure is included to show which SNF2 family proteins are more likely to be misassigned as HELLS/DDM1 orthologs. We believe that the figure serves this purpose.

      3) What is the meaning of the coloring patterns of ICF residues in znf?

      ICF residues are highlighted as light blue in the schematics to indicate its conservation. In the alignment, the coloring reflects the level of conservation within the shown set of proteins, and the choice of coloring was set by Jalview.

      4) To improve clarity: the introduction could be more focused on evolutionary considerations and functional link between CDCA7-HELLS and DNMTs.

      We revised the first paragraph of the introduction to illustrate this point.

      5) Could indicate the CDC7A loss / DNA methylation hypothesis in the abstract.

      We now included this hypothesis in the Abstract.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study provides insights into the early detection of malignancies with noninvasive methods. The study contained a large sample size with external validation cohort, which raises the credibility and universality of this model. The new model achieved high levels of AUC in discriminating malignancies from healthy controls, as well as the ability to distinguish tumor of origin. Based on these findings, prospective studies are needed to further confirm its predictive capacity.

      However, there are several concerns about the manuscript, which needs to be clarified or modified.

      1) The use of "multimodal model" will definitely increase workload of the testing. From the results of this manuscript, the integration of multimodal data did not significantly outperform the EM-based model. Is this kind of integration necessary? Is that tool really cost-effective? The authors did not convince me of its necessity, advantages, and clinical application.

      To provide further evidence supporting the advantages of using multimodal model (stack model) over EM-based model, we performed the DeLong test and provided data in Table S7 and Figure S6. Our data show that the stack model outperformed the EM-based model, with significantly higher AUC (AUC difference = 0.0286, p<0.0001). Moreover, the stack model exhibited significantly higher sensitivity for detecting cancer patients of five cancer types in both discovery (73.8% versus 59.5%, p<0.0001, Figure S6A) and validation cohort (72.4% versus 61.5%, p=0.0002, Figure S6B) at comparable specificity of > 95%. The number of misclassified cases were lower when using stack model as compared to the EM-based model (Figure S6C and S6D). Strikingly, we observed that the stack model significantly improved the sensitivity for detecting lung cancer patients compared to the EM based model in both discovery (78.5% versus 44.1%, Figure S6A) and validation cohort ( 83.7% versus 55.8%, Figure S6B), indicating that other ctDNA signatures are also the important biomarkers for detecting lung cancer. Therefore, we conclude that the combination of multiple signatures of ctDNA, ie. the multimodel approach, could improve the sensitivity of multi-cancer detection.

      Given the same wet lab protocol, the difference in computational time between a single EM-based model and the stack model is about 10-11 minutes per sample, but the real difference in analysis time can be reduced to ~1 min/sample by parallelization. With regards to the wet lab protocol, an important novelty of SPOT-MAS technology is its all-in-one approach that enables simultaneous analysis of different ctDNA signatures using a single blood draw and a single library reaction, greatly reducing the experimental cost. Thus, we strongly argue that our approach improves the detection sensitivity by increasing the breadth of ctDNA analysis while achieving cost effectiveness for sample preparation and sequencing with negligible trade-off of analysis time .

      We have also added the following sentences in the discussion to clarify this point. (Line 618-625)

      “Moreover, this study showed that the feature of EM achieved the highest performance among the five examined ctDNA signatures in discriminating cancer from healthy controls (Figure S6). Importantly, we found that combining EM with other ctDNA signatures in a stack model could further improve the sensitivity for detecting cancer samples, with significant improvement for lung cancer patients (Figure S6A and S6B). These findings highlighted that the multimodal analysis of multiple ctDNA signatures by SPOT-MAS could increase the breadth of ctDNA feature analysis, thus enhancing the detection sensitivity while maintaining the low cost of sample preparation and sequencing.”

      2) The baseline characteristics of part of the enrolled patients are not clear. It seems that some of the cancer patients were diagnosed only by imaging examinations. The manuscript described "staging information was not available for 25.7% of cancer patients, who were confirmed by specialized clinicians to have non-metastatic tumors". I have no idea how did this confirmation make? According to clinicians' experience only?

      Our study only recruited cancer patients with non-systemic-metastatic stages (Stage I-IIIA) in which cancer is localized to the primary sites and has not spread to other organs. We excluded patients who were diagnosed with metastatic stage IIIB and IV cancer. All healthy subjects were confirmed to have no history of cancer at the time of enrollment. They were followed up at six months and one year after enrollment. The majority of cancer patients (74.3%) were confirmed to have cancer by abnormal imaging examination and subsequent tissue biopsy confirmation of tumor staging and metastasis status. For patients with unavailable staging information (25.7%), they initially went to the study hospitals for imaging examination. Upon receiving positive imaging results (MRI scan or CT scan), they moved to another hospital for surgery, leading to missing tumor staging information at the original study hospitals. The metastasis status of these patients were later obtained via communications between the clinicians at the study hospitals and the clinicians at the surgery hospitals, subject to existing data sharing agreement between the two hospitals. For those with metastatic cancer or unclear metastatic status, they were excluded from our study.

      We have added the following sentences in the method (Line 127-135) and discussion section (Line 679-688).

      “Cancer patients were confirmed to have cancer by abnormal imaging examination and subsequent tissue biopsy confirmation of malignancy. Cancer stages were determined by the TNM (Tumor, Node, Metastasis) system classification according to the American Joint Committee on Cancer and the International Union for Cancer Control. Our study only recruited cancer patients with non-systemic-metastatic stages (Stage I-IIIA) in which cancer is localized to the primary sites and has not spread to other organs. We excluded patients who were diagnosed with metastatic stage IIIB and IV cancer. All healthy subjects were confirmed to have no history of cancer at the time of enrollment. They were followed up at six months and one year after enrollment to ensure that they did not develop cancer.”

      “For patients with unavailable staging information, their initial imaging examinations were conducted at the study hospitals. However, subsequent tests and surgical procedures were performed at a different hospital, as per the patients' preferences. Consequently, the original study hospitals lacked access to comprehensive tumor staging data. To address this limitation, the metastasis status of these patients was obtained via communication channels between the clinicians at the study hospitals and those at the surgery hospitals. This enabled the retrieval of limited information, adhering to an established data-sharing agreement between the two institutions. To maintain the robustness of our analysis, patients diagnosed with metastatic cancer or those with indeterminate metastatic status were subsequently excluded from the study.”

      3) It seems that one of the important advantages of this new model is the low depth coverage in comparing to previous screening models for cancer. The authors should discuss more on the reason why the new model could achieve comparable predictive accuracy with an obviously lower sequencing depth.

      We thanked the reviewer for the suggestion. We have added the following sentences in the discussion to explain why our assay could achieve good performance at low depth sequencing. (Line 571-584)

      “However, the low amount of ctDNA fragments in plasma samples of patients with early-stage cancer as well as the molecular heterogeneity of different cancer types are known as the major challenges for liquid biopsy based multi-cancer detection assays. Thus, sequencing at high depth coverages is required to capture enough informative cancer DNA fragments in the finite plasma sample to achieve early cancer detection. In support to this notion, many groups (1-4) have developed assays that exploited high depth coverage of sequencing to detect ctDNA fragments in plasma of early stage cancer patients. However, this strategy might not be cost effective and feasible for population wide screening in developing countries. Alternatively, we argued that increasing breadth of ctDNA analysis could maximize the ability to detect ctDNA fragments with heterogeneous genetic and epigenetic changes at shallow sequencing depth, thus improving the sensitivity for multicancer detection. To demonstrate the feasibility of this approach, we built a stacking ensemble model to combine nine different ctDNA signatures and demonstrated its superior performance on cancer detection in comparison to single-feature models (Figure 7B and 7C).”

      4) The readability of this manuscript needs to be improved. The focus of the background section is not clear, with too much detail of other studies and few purposeful summaries. You need to explain the goals and clinical significance of your study. In addition, the results section is too long, and needs to be shortened and simplified. Move some of the inessential results and sentences to supplementary materials or methods.

      We thank the reviewer for these constructive suggestions. Accrodingly, we have reduced the details of other studies (Line 85-91) as follows:

      “In recent years, there has been considerable interest in exploring the potential of ctDNA alterations for early detection of cancer (5, 6). One such approach is the PanSeer test, which uses 477 differentially methylated regions (DMRs) in ctDNA to detect five different types of cancer up to four years prior to conventional diagnosis (7). The DELFI assay employs a genome-wide analysis of ctDNA fragment profiles to increase sensitivity in early detection (1). Recently, the Galleri test has emerged as a multi-cancer detection assay that analyses more than 100,000 methylation regions in the genome to detect over 50 cancer types and localize the tumor site (8).”

      We have modified the text in the introduction to explain the goals and clinical significance of our study (Line 111-123)

      “In this study, we aimed to expand our multimodal approach, SPOT-MAS, to comprehensively analyze methylomics, fragmentomics, DNA copy number and end motifs of cfDNA and evaluate its utility to simultaneously detecting and locating cancer from a single screening test.” “Our findings demonstrate that the multimodal approach of SPOT-MAS enables profiling of multiple ctDNA signatures across the entire genome at low sequencing depth to detect five different cancer types in their early stages. Beyond detecting the presence of cancer signals, our assay was able to predict the tumor location, which is important for clinicians to fast-track the follow-up diagnostic and guide necessary treatment. Thus, SPOT-MAS has the potential to become a universal, simple, and cost-effective approach for early multi-cancer detection in a large population.”

      Reviewer #2 (Public Review):

      The authors tried to diagnose cancers and pinpoint tissues of origin using cfDNA. To achieve the goal, they developed a framework to assess methylation, CNA, and other genomic features. They established discovery and validation cohorts for systematic assessment and successfully achieved robust prediction power.

      1) Still, there are places for improvement. The diagnostic effect can be maximized if their framework works well in early-stage cancer patients. According to Table 1, about 10% of the participants are stage I. Do these cancers also perform well as compared to late stage cancers?

      We have performed the comparison of SPOT-MAS performance on different stages and provided the data in Supplementary table S8 and Supplementary Figure S4J and S4L. Our data showed that SPOT-MAS achieved lower sensitivity for detecting stage I and II cancers as compared to stage IIIA cancers in both discovery (61.54% and 69.82% for stage I and II respectively versus 78.67% for stage IIIA, Supplementary table 8) and validation cohort (73.91% and 62.32% for stage I and II, respectively versus 88.31% for stage IIIA, Supplementary table 8). This suggested that cancer stages can influence the performance of our models.

      2) Can authors show a systematic comparison of their method to other previous methods to summarize what their algorithm can achieve compared to others.

      We have conducted a systematic comparison of our method with others in the Supplementary Table S11.

      Reviewer #1 (Recommendations For The Authors):

      There are still points for the authors to clarify and consider for incorporation into revision.

      • Please first clarify the issues mentioned in "public review". Several complements are needed.

      We have addressed all of the reviewer’s comments in “public review”.

      1) Line 72-73: Different approaches of early cancer screening assays have different features, application scenarios, and of course, limitations. It's too vague to describe in this way. More importantly, diagnosis of malignancies relies on pathological diagnosis, I don't think the results of unsuccessful screening would be overdiagnosis and overtreatment. That's overstatements.

      We have rewritten the statement as follows (Line 72-75)

      “Although currently guided screening tests have each been shown to provide better treatment outcomes and reduce cancer mortality, some of them are invasive, thus having low accessibility. Importantly, most of them are single cancer screening tests, which may result in high false positive rates when used sequentially.”

      2) Line 115-130: The findings in this study shouldn't be introduced here.

      We have removed this section.

      3) Line 496-498: It surprised me that the model performed even better in independent validation cohort, which is quite different from the usual situations. Please explain it.

      We agree with the reviewer that model performance in independent validation cohort is often lower than in discovery cohort. In our case, we have carefully confirmed our data by utilizing cross-validation (CV). Cross-validation is a widely used process in which the data being used for training the model is separated into folds or partitions and the model is trained and validated for each fold; the performance estimates are then calculated to obtain mean and confidence interval (GraphPad Prism, Wilson/Brown method). To further confirm our findings, we have increased the cross-validation fold into 50, and consistently detected no significant difference in the performance between Discovery and Validation cohorts (p=0.1277, DeLong’s test).

      We have added the following sentence in the discussion to explain this (Line 633-635)

      “Despite a slightly higher AUC value in the validation cohort compared to the discovery cohort, no significant differences in AUC values were observed between the two cohorts at CV of 10 or 50 (p=0.1277, DeLong’s test).”

      4) Line 499-501: For the cut-off value selection, the authors thought that for cancer screening, specificity is more important than sensitivity? It's controversial. The sensitivity is only approximately 70%, I think that a missed diagnosis is even worse.

      We agree with the reviewer that both specificity and sensitivity are important metrics of a cancer detection test. However, there is a trade-off between sensitivity and specificity and the preference for either one of them remains a controversial topic. For a screening test, the preference should be determined by considering the prevalence of the disease, in this case - cancer. The low prevalence of cancers indicates that even a small percentage of false-positive test results due to low specificity of the assay, spread across a national population, would hugely increase the demand for confirmatory imaging as well as biopsy sampling of imaging-detected benign abnormalities (9). Thus, false positives have obvious implications for health-care resources as well as patient well-being. Conversely, higher sensitivities will make sure that more cancer cases are detected and avoid delays in diagnosis. To mitigate the impact of insufficient sensitivity of a cancer screening test, it is important to consult the test-takers that current liquid biopsy tests should only be used as a complementary approach to the available diagnosis tests to increase rates of cancer detection. To be used as a stand-alone test, further work is required to improve its performance, with more focus on increasing sensitivity while maintaining high specificity.

      We have added the following sentences in the discussion to explain why we set a high threshold of specificity (Line 660-671)

      “For an effective screening test, careful consideration of disease prevalence, cancer in this context, is imperative. Given the low prevalence of cancers, even a small proportion of false-positive test results arising from reduced assay specificity, if extrapolated to a national population, could significantly escalate the need for confirmatory imaging and biopsy procedures for benign abnormalities detected during screening. Thus, false-positives can have substantial implications for both healthcare resources and patient well-being. Conversely, a screening test with high sensitivity ensures that most cancer cases are detected and minimizes delays in diagnosis. To address potential limitations posed by low sensitivity in cancer screening tests, we suggest that current liquid biopsy tests should be employed as a complementary approach to existing diagnostic methods to enhance cancer detection rates. To be used a stand-alone test, further work is required to improve its performance, with a particular emphasis on improving sensitivity while preserving high specificity.”

      5) The methylation profiles have been used broadly in ctDNA, while your also integrated the fragmentomics, copy number aberration and end motif into the new model. In the discussion section, it would be better to further compare your new model with several previous models based on conventional ctDNA methylation markers (10, 11) for early detection of malignancies. What are the advantages of adding the other two types of data? Why the new model could achieve comparable predictive accuracy with an obviously lower sequencing depth?

      We thank the reviewer for the suggestion. We have added the following sentences in the discussion to highlight the novelty of our multimodal approach. (Line 587-610)

      “Previous studies have reported that methylation changes at target regions could be exploited for detecting ctDNA in plasma of patients with early-stage cancer (10, 11).”

      “In addition to methylation alterations, recent studies have revealed that the DNA copy number, fragmentomics profile (1) and end motif profile (12) at genome wide scales have been shown as useful features for healthy-cancer classification. Therefore, we propose that the combination of these markers might provide added value to increase the performance of liquid biopsy assays. We demonstrated that the same bisulfite sequencing data could be used to identify somatic CNA (Figure 4), cancer-associated fragment length (Figure 5) and end motifs (Figure 6), highlighting the advantage of SPOT-MAS in capturing the broad landscape of ctDNA signatures without high cost deep sequencing. For cancer-associated fragment length, we pre-processed this data into five different feature tables to better reflect the information embedded within the data. Overall, we integrated multiple features of ctDNA including methylation, fragment length, end motif and copy number changes into a multi-cancer detection model and demonstrated that this approach could distinguish healthy individuals with patients from five popular cancer types. This strategy enables increased breadth of ctDNA analysis at shallow sequencing depth to overcome the limitation of low amount of ctDNA fragments in plasma samples as well as molecular heterogeneity of cancers.”

      Moreover, we have conducted a systematic comparison of our method with others in the Supplementary Table 11.

      6) Line 667-668: The wording should be modest. "Successfully detect and localize" is not appropriate.

      We have rewritten the sentence. (Line 713-716)

      “Our large-scale case-control study demonstrated that SPOT-MAS, with its unique combination of multimodal analysis of cfDNA signatures and innovative machine-learning algorithms, can detect and localize multiple types of cancer with high accuracy at a low-cost sequencing.”

      Reviewer #2 (Recommendations For The Authors):

      1) Are the patients and controls all from Vietnam? If I am not mistaken, it is hard to find demographic information for controls. Also it is not clear if samples from controls were processed simultaneously or at a same institution or using the same protocol etc.

      We thank the reviewer for asking this question. All cancer patients and controls are from Vietnam, who were recruited from five hospitals including Medic Medical Center, University Medical Center Ho Chi Minh City, Thu Duc City Hospital, National Cancer Hospital and Hanoi Medical University. At each research sites, blood samples from both cancer patients and healthy subjects were collected in in Streck Cell-Free DNA BCT tubes and subsequently transported to a central laboratory located in Medical Genetics Institute for cfDNA isolation, library preparation and sequencing. In a recent publication (10), we have investigated the impact of logistic time and hemolysis rates of blood samples collected from different clinical sites on cfDNA concentration and sequencing quality. We did not observe any noticeable impact of such variations on cfDNA concentrations or sequencing library yields. However, future analytical validation studies are required to evaluate the impact of variation in sampling technique across different clinical sites on the robustness or accuracy of assay results.

      We have added the following sentences in the discussion to highlight this important point (Line 696-704)

      “At each research sites, blood samples from both cancer patients and healthy subjects were collected in in Streck Cell-Free DNA BCT tubes and subsequently transported to a central laboratory located in Medical Genetics Institute for cfDNA isolation, library preparation and sequencing. In a recent publication (10), we have investigated the impact of logistic time and hemolysis rates of blood samples collected from different clinical sites on cfDNA concentration and sequencing quality. We did not observe any noticeable impact of such variations on cfDNA concentrations or sequencing library yields. However, future analytical validation studies using a larger sample size are required to evaluate the impact of variation in sampling technique across different clinical sites on the robustness or accuracy of assay results.”

      References

      1. Cristiano S, Leal A, Phallen J, Fiksel J, Adleff V, Bruhm DC, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019;570(7761):385-9.

      2. Cohen JD, Li L, Wang Y, Thoburn C, Afsari B, Danilova L, et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science. 2018;359(6378):926-30.

      3. Liu MC, Oxnard GR, Klein EA, Swanton C, Seiden MV. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann Oncol. 2020;31(6):745-59.

      4. Stackpole ML, Zeng W, Li S, Liu C-C, Zhou Y, He S, et al. Cost-effective methylome sequencing of cell-free DNA for accurately detecting and locating cancer. Nature Communications. 2022;13(1):5566.

      5. Constantin N, Sina AA, Korbie D, Trau M. Opportunities for Early Cancer Detection: The Rise of ctDNA Methylation-Based Pan-Cancer Screening Technologies. Epigenomes. 2022;6(1).

      6. Phan TH, Chi Nguyen VT, Thi Pham TT, Nguyen VC, Ho TD, Quynh Pham TM, et al. Circulating DNA methylation profile improves the accuracy of serum biomarkers for the detection of nonmetastatic hepatocellular carcinoma. Future Oncol. 2022;18(39):4399-413.

      7. Chen X, Gole J, Gore A, He Q, Lu M, Min J, et al. Non-invasive early detection of cancer four years before conventional diagnosis using a blood test. Nature Communications. 2020;11(1):3475.

      8. Jamshidi A, Liu MC, Klein EA, Venn O, Hubbell E, Beausang JF, et al. Evaluation of cell-free DNA approaches for multi-cancer early detection. Cancer Cell. 2022;40(12):1537-49.e12.

      9. Ignatiadis M, Sledge GW, Jeffrey SS. Liquid biopsy enters the clinic - implementation issues and future challenges. Nat Rev Clin Oncol. 2021;18(5):297-312.

      10. Xu RH, Wei W, Krawczyk M, Wang W, Luo H, Flagg K, et al. Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carcinoma. Nat Mater. 2017;16(11):1155-61.

      11. Luo H, Zhao Q, Wei W, Zheng L, Yi S, Li G, et al. Circulating tumor DNA methylation profiles enable early diagnosis, prognosis prediction, and screening for colorectal cancer. Sci Transl Med. 2020;12(524).

      12. Jiang P, Sun K, Peng W, Cheng SH, Ni M, Yeung PC, et al. Plasma DNA End-Motif Profiling as a Fragmentomic Marker in Cancer, Pregnancy, and Transplantation. Cancer Discovery. 2020;10(5):664-73.

    1. Author Response

      Pancreatic phenotype reported by Wang et al., 2019 (PMID 30324491)

      The reported human knockout of SLC39A5 (homozygous for R311* allele) suggests that SLC39A5 is dispensable for embryonic development with no adverse effect on postnatal pancreatic development or function (Saleheen D, 2017). Indicative of conserved expression and function, Slc39a5 is non-essential in mice, with homozygous or heterozygous deletion of Slc39a5 resulting in elevated serum zinc (Fig. 2) and no resulting impairment in pancreatic development or function (Fig. S3, S4E-F, S5E-F, S6E-F, S7E-F, S8A-H).

      The observed antihyperglycemic effects in the Slc39a5 LOF animals were not driven by changes in insulin production and/or clearance (Fig. S3, S4E-F, S5E-F, S6E-F, S7E-F, S8A-H). Our observations related to pancreatic function (both exocrine and endocrine; Fig. 3 and Suppl. Table 3-5) in the Slc39a5 LOF mice are in agreement with reported metabolic phenotyping by the International Mouse Phenotyping Consortium (https://www.mousephenotype.org/data/genes/MGI:1919336). Intriguingly, Wang et al. reported impaired insulin secretion in mice with Ins2-cre mediated deletion of Slc39a5 in β-cell cells (Wang X, 2019). These findings are difficult to interpret in light of single cell RNA-seq analyses of mouse pancreas demonstrating absence of Slc39a5 expression in Ins2+ pancreatic β-cells (The Tabula Muris Consortium, 2018 and 2020). Consistently, SLC39A5 expression in human pancreas is largely restricted to pancreatic acinar and ductal cells (Baron M, 2016; Muraro MJ, 2016; Xin Y, 2016).

      Taken together, these observations suggest that the protective metabolic changes are presumably extra-pancreatic in both mouse and human.

      Sex Differences:

      Slc39a5 LOF activates hepatic AMPK signaling in both sexes, hepatic AKT signaling is elevated in females, suggesting that the observed glucose lowering effects in the Slc39a5 LOF male mice is possibly driven by improvements in extra hepatic glucose metabolism in males or that the magnitude of zinc mediated protein phosphatase inhibition is insufficient to influence the hepatic PI3K/AKT signaling in males. Whether the promotion of hepatic AMPK and AKT signaling occurs solely as a result of zinc mediated inhibition of protein phosphatases or a result of concurrent convergent mechanisms potentially influenced by sex hormones remains to be resolved in future investigations.

      Overall, integrated analyses of the metabolic phenotyping in our models (both diet-induced and congenital obesity) are consistent with the well documented sex-dependent susceptibility to obesity-related metabolic alterations such as insulin resistance, hepatic steatosis and dyslipidemia (Goodpaster BH, 2003; Priego T, 2008; Medrikova D, 2012; Bertolotti M, 2014; Frias JO 2001; Krotkiewski M, 1983; Lebeck J, 2016).

    1. Author Response

      Reviewer #1 (Public Review):

      Medwig-Kinney et al perform the latest in a series of studies unraveling the genetic and physical mechanisms involved in the formation of C. elegans gonad. They have paid particular attention to how two different cell fates are specified, the ventral uterine (VU) or anchor cell (AC), and the behaviors of these two cell types. This cell fate choice is interesting because the anchor cell performs an invasive migration through a basement membrane. A process that is required for correct C. elegans gonad formation and that can act as a model for other invasive processes, such as malignant cancer progression. The authors have identified a range of genes that are involved in the AC/VC fate choice, and that imparts the AC cell with its ability to arrest the cell cycle and perform an invasive migration. Taking advantage of a range of genetic tools, the authors show that the transcription factor NHR-63 is strongly expressed in the AC cell. The authors also present evidence that NHR-63 is could function as a transcriptional repressor through interactions with a Groucho and also a TCF homolog, and they also suggest that these proteins are forming repressive condensates through phase separation.

      The authors have produced an extensive dataset to support their two primary claims: that NHR-67 expression levels determine whether a cell is invasive or proliferative, and also that NHR-67 forms a repressive complex through interactions with other proteins. The authors should be commended for clearly and honestly conveying what is already known in this area of study with exhaustive references. But absent data unambiguously linking the formation and dissolution of NHR-67 condensates with the activation of downstream genes that NHR-67 is actively repressing, the novelty of these findings is limited.

      Response 1.1: We thank the reviewer for recognizing the extensive dataset we provide in this manuscript in support of our claims that, (1) NHR-67 expression levels are important for distinguishing between AC and VU cell fates, and (2) NHR-67 interacts with transcriptional repressors in VU cells. We acknowledge that a complete mechanistic understanding of the functional significance of NHR-67 puncta is not possible without knowing direct targets of NHR-67 in the AC. Unfortunately, tools to identify transcriptional targets in individual cells or lineages in C. elegans do not exist, and generation of such tools would be beyond the scope of this work. This is evidenced by the fact that the first successful attempt to transcriptionally profile the AC was only posted as a preprint one month ago (Costa et al., doi: 10.1101/2022.12.28.522136). It is our hope that the findings we present here can be integrated with future AC- and VUspecific profiling efforts to provide a more complete picture of the functional significance of NHR-67 subnuclear organization.

      Reviewer #2 (Public Review):

      Medwig-Kinney et al. explore the role of the transcription factor NHR-67 in distinguishing between AC and VU cell identity in the C. elegans gonad. NHR-67 is expressed at high levels in AC cells where it induces G1 arrest, a requirement for the AC fate invasion program (Matus et al., 2015). NHR-67 is also present at low levels in the non-invasive VU cells and, in this new study, the authors suggest a role for this residual NHR-67 in maintaining VU cell fate. What this new role entails, however, is not clear. The model in Figure 7E shows NHR-67 switching from a transcriptional activator in ACs to a transcriptional repressor in VUs by virtue of recruiting translational repressors. In this model, NHR-67 actively suppresses AC differentiation in VU cells by binding to its normal targets and acting as a repressor rather than an activator. Elsewhere in the text, however, the authors suggest that NHR-67 is "post-translationally sequestered" (line 450) in nuclear condensates in VU cells. In that model, the low levels of NHR-67 in VU cells are not functional because inactivated by sequestration in condensates away from DNA. Neither model is fully supported by the data, which may explain why the authors seem to imply both possibilities. This uncertainty is confusing and prevents the paper from arriving at a compelling conclusion. What is the function, if any, of NHR-67 and so-called "repressive condensates" in VU cells?

      Response 2.1: As the reviewer correctly notes, we present two possible models in this manuscript. The interaction between NHR-67 and the Groucho/TCF complex in the VU cells could (1) switch the role of NHR-67 from a transcriptional activator to a transcriptional repressor, or (2) sequester NHR-67 away from its transcriptional targets. Indeed, we cannot definitively exclude the possibility of either model. In our resubmission, we will attempt to make this more clear in the text and by presenting both possible models in the summary figure (Fig. 7E).

      Below we list problems with data interpretation and key missing experiments:

      1) The authors report that NHR-67 forms "repressive condensates" (aka. puncta) in the nuclei of VU cells and imply that these condensates prevent VU cells from becoming ACs. Fig. 3A, however, shows an example of an AC that also assemble NHR-67 puncta (these are less obvious simply due to the higher levels of NHR-67 in ACs). The presence of NHR-67 puncta in the AC seems to directly contradict the author's assumption that the puncta repress the AC fate program. Similarly, Figure 5-figure supplement 1A shows that UNC-37 and LSY-22 also form puncta in ACs. The authors need to analyze both AC and VU cells to demonstrate that NHR-67 puncta only form in VUs, as implied by their model.

      Response 2.2: The puncta formed by NHR-67 in the AC are different in appearance than those observed in the VU cells and furthermore do not exhibit strong colocalization with that of UNC-37 or LSY-22. The Manders’ overlap coefficient between NHR-67 and UNC-37 is 0.181 in the AC, whereas it is 0.686 in the VU cells. Likewise, the Manders’ overlap coefficient between NHR-67 and LSY-22 is 0.189 in the AC compared to 0.741 in the VU cells. We speculate that the areas of NHR-67 subnuclear enrichment in the AC may represent concentration around transcriptional targets, but testing this would require knowledge of direct targets of NHR-67.

      2) While a pool of NHR-67 localizes to "repressive condensates", it appears that a substantial portion of NHR-67 also exists diffusively in the nucleoplasm. This would appear to contradict a "sequestration model" since, for such a model to work, a majority of NHR-67 should be in puncta. What proportion of NHR-67 is in puncta? Is the concentration of NHR-67 in the nucleoplasm lower in VUs compared to ACs and does this depend on the puncta?

      Response 2.3: The proportion of NHR-67 localizing to puncta versus the nucleoplasm is dynamic, as these puncta form and dissolve over the course of the cell cycle. However, we estimate that approximately 25-40% of NHR-67 protein resides in puncta based on segmentation and quantification of fluorescent intensity of sum Z-projections. We also measured NHR-67 concentration in the nucleoplasm of VU cells and found that it is only 28% of what is observed in ACs (n = 10). We disagree with the notion that the majority of NHR-67 protein should be located in puncta to support the sequestration model. As one example, previously published work examining phase separation of endogenous YAP shows that it is present in the nucleoplasm in addition to puncta (Cai et al., 2019, doi: 10.1038/s41556-019-0433-z). In our system, it is possible that the combination of transcriptional downregulation and partial sequestration away from DNA is sufficient to disrupt the normal activity of NHR-67.

      3) The authors do not report whether NHR-67, UNC-37, LSY-22, or POP-1 localization to puncta is interdependent, as implied in the model shown in Fig. 7.

      Response 2.4: It is difficult to test whether localization of these proteins to puncta is interdependent, as perturbation of UNC-37, LSY-22, and POP-1 result in ectopic ACs. Trying to determine if loss of puncta results in VU-to-AC transdifferentiation or vice versa becomes a chicken-egg argument. It is also possible that UNC-37 and LSY-22 are at least partially redundant in this context. We based our model, shown in Fig. 7E, on known or predicted protein-protein interactions, which we confirmed through yeast two-hybrid analyses (Fig. 7D; Fig. 7-figure supplement 1).

      4) The evidence that the "repressor condensates" suppress AC fate in VUs is presented in Fig. 4D where the authors deplete the presumed repressor LSY-22. First, the authors do not examine whether NHR-67 forms puncta under these conditions. Second, the authors rely on a single marker (cdh-3p::mCherry::moeABD) to score AC fate: this marker shows weak expression in cells flanking one bright cell (presumably the AC) which the authors interpret as a VU AC transformation. The authors, however, do not identify the cells that express the marker by lineage analyses and dismiss the possibility that the marker-positive cells could arise from the division of an ACcommitted cell. Finally, the authors did not test whether marker expression was dependent on NHR-67, as predicted by the model shown in Fig. 7.

      Response 2.5: For the auxin-inducible degron experiments, strains contained labeled AID-tagged proteins, a labeled TIR1 transgene, and a labeled AC marker. Thus, we were limited by the number of fluorescent channels we could covisualize and therefore could not also visualize NHR-67 (to assess for puncta formation) or another AC marker (such as LAG-2). We could have generated an AID-tagged LSY-22 strain without a fluorescent protein, but then we would not be able to quantify its depletion, which this reviewer points out is important to measure. We did visualize NHR-67::GFP expression following RNAi-induced knockdown of POP-1 and observed consistent loss of puncta in ectopic ACs. However, this again becomes a chicken-egg argument as far as whether cell fate change or loss of puncta causes the other.

      5) Interaction between NHR-67 and UNC-37 is shown using Y2H, but not verified in vivo. Furthermore, the functional significance of the NHR-67/UNC-37 interaction is not tested.

      Response 2.6: We attempted to remove the intrinsically disordered region found at the C-terminus of the endogenous nhr-67 locus, using CRISPR/Cas9, as this would both confirm the NHR-67/UNC-37 interaction in vivo and allow us to determine the functional significance of this interaction. However, we were unable to recover a viable line after several attempts, suggesting that this region of the protein is vital.

      6) Throughout the manuscript, the authors do not use lineage analysis to confirm fate transformation as is the standard in the field.

      Response 2.7: The timing between AC/VU cell fate specification and AC invasion (the point at which we look for differentiated ACs) is approximately 10-12 hours at 25 °C. With our imaging setup, we are limited to approximately 3-4 hours of live-cell imaging. Therefore, lineage tracing was not feasible for our experiments. Instead, we relied on visualization of established markers of AC and VU cell fate to determine how ectopic ACs arose. In Fig. 6B,C we show that the expression of two AC markers (cdh-3 and lag-2) turn on while a VU marker (lag-1) get downregulated within the same cell. In our opinion, live-imaging experiments that show in real time changes in cell fate via reporters was the most definitive way to observe the phenotype.

      There are 4 multipotential gonadal cells with the potential to differentiate into VUs or ACs. Which ones contribute to the extra ACs in the different genetic backgrounds examined was not determined, which complicates interpretation. The authors should consider and test the following possibilities: disruption of NHR-67 regulation causes 1) extra pluripotent cells to directly become ACs early in development, 2) causes VU cells to gradually trans-fate to an AC-like fate after VU fate specification (as implied by the authors), or 3) causes an AC to undergo extra cell division(s)?? In Fig. 1F, 5 cells are designated as ACs, which is one more that the 4 precursors depicted in Fig. 1A, implying that some of the "ACs" were derived from progenitors that divided.

      Response 2.8: When trying to determine the source of the ectopic ACs, we considered the three possibilities noted by the reviewer: (1) misspecification of AC/VU precursors, (2) VU-to-AC transdifferentiation, or (3) proliferation of the AC. We eliminated option 3 as a possibility, as the ectopic ACs we observed here were invasive and all of our previous work has shown that proliferating ACs cannot invade and that cell cycle exit is necessary for invasion (Matus et al., 2015; MedwigKinney & Smith et al., 2020; Smith et al., 2022). Specifically, NHR-67 is upstream of the cyclin dependent kinase CKI-1 and we found that induced expression of NHR-67 resulted in slow growth and developmental arrest, likely because of inducing cell cycle exit. For our experiment using hsp::NHR-67, we induced heat shock after AC/VU specification. For POP-1 perturbation, we explicitly acknowledged that misspecification of the AC/VU precursors could also contribute to ectopic ACs (Fig. 6A; lines 368-385). We could not achieve robust protein depletion through delayed RNAi treatment, so instead we utilized timelapse microscopy and quantification of AC and VU cell markers (Fig. 6B,C; see response 2.7 above).

      In conclusion, while the authors report on interesting observations, in particular the co-localization of NHR-67 with UNC-37/Groucho and POP-1 in nuclear puncta, the functional significance of these observations remains unclear. The authors have not demonstrated that the "repressive condensates" are functional and play a role in the suppression of AC fate in VU cells as claimed. The colocalization data suggest that NHR-67 interacts with repressors, but additional experiments are needed to demonstrate that these interactions are specific to VUs, impact VU fate, and sequester NHR-67 from its targets or transform NHR-67 into a transcriptional repressor.

      Response 2.9: We agree that, at this time, we cannot pinpoint the precise mechanism through which NHR-67 puncta function (i.e., by sequestering NHR-67 from DNA or switching the role of NHR-67 from activating to repressing). However, identification of NHR-67 puncta and their colocalization with UNC-37, LSY-22, and POP-1 in VU cells allowed us to discover an undescribed role for the Groucho/TCF complex in maintaining VU cell fate. This, combined with our evidence demonstrating that NHR-67 transcriptional regulation is important for distinguishing between AC and VU cell fate, are the main contributions of our study.

      Reviewer #1 (Recommendations For The Authors):

      I am not a C. elegans researcher and I find this paper fairly hard to follow. One major recommendation I would like to see is to improve the consistency of the labeling of the figures. There are many figures showing many things and I struggled to keep track of everything. For example, the thing that we are looking at in the microscope images (typically GFP tagged to a protein of interest) is sometimes labeled above the image, sometimes to the side, and sometimes within the panel. Experimental conditions are also formatted arbitrarily. As much as they can do so, could the authors try and make their labeling consistent? This would help me follow the data.

      Response 1.2: We thank the reviewer for this suggestion and have reorganized the figures (namely Figure 3, Figure 4, Figure 4–figure supplement 1, Figure 5, and Figure 6) such that the tagged allele or marker is labeled at the top, and the time, stage, and/or perturbation is labeled on the side.

      Is the yeast one-hybrid assay enough to confirm a direct interaction between HLH-2 and NHR-67? Obviously, it supports it, but since this is not a definitive test in C. elegans, I feel the description of this result should be modified to account for this.

      Response 1.3: We agree that the yeast one-hybrid assay identifies sequences that are capable of being bound to a protein and does not prove that a DNA-protein interaction occurs in vivo. We have modified our language describing this result in our resubmission (lines 222-224).

      NHR-67 and POP-1 eventually form two large spots. This observation supports the claims that these are condensates, but it is clearly different from the observations in Ciona where the condensates remain more or less stable until they quickly dissolve at the onset of mitosis. Do the authors have any idea why these condensates are behaving this way? Is it always two spots? This implies it is forming around some sort of diploid nuclear structure.

      Response 1.4: Hes.a puncta observed in Ciona were indeed shown to be dynamic, as puncta were captured fusing together (see Figure 6B of Treen et al., 2021). However, these puncta did not appear to coalesce into two puncta specifically, as is consistently observed with NHR-67 in C. elegans. We agree with the reviewer in that this observation is very interesting and likely correlates to a diploid nuclear structure, however we have yet to identify this.

      In Ciona, for the two examples of repressive condensates, it was shown that the removal of the C-terminal Groucho recruiting repressor domains of HesA end ERF disrupts condensate formation. Have the authors attempted a similar experiment for NHR-67 or Pop1?

      Response 1.5: We agree that this would have been an ideal experiment to perform. We attempted to remove the intrinsically disordered region found at the C-terminus of NHR-67 with CRISPR, but were unable to generate a stable line, suggesting that this region may be critical for NHR-67 function in other developmental stages or tissues.

      Other minor points:

      Fig 4D - I found the labeling of this figure the most confusing.

      Response 1.6: We thank the reviewer for bringing this to our attention. For this panel, in addition to the changes we made reference above (Response 1.2), we simplified the labeling of the TIR1 transgene and instead reference it in the figure legend for simplicity.

      Line 354 - I think this is mislabeled. Is it supposed to be Figure 5H, not 5F, and 5B, not 5C?

      Response 1.7: We thank the reviewer for spotting this error. This reference to Figure 5F has been updated and now correctly references Figure 5H (line 338).

      Reviewer #2 (Recommendations For The Authors):

      The authors use several methods to overexpress NHR-67 including 1) an NHR-67 transgene (Fig. 1), 2) overexpression of the transcriptional activator HLH-2 or 3) removal of a factor that normally degrades HLH-2 in VU cells (Fig. 2). In all cases, the rate of VU AC transformation is either very low (5%) or not reported but presumed to be zero, since other groups have done similar experiments and reported no such conversion (eg. Benavidez et al., 2022). What is the significance of this finding? Does this mean that high levels of NHR-67 are not sufficient to promote AC fate because NHR-67 is sequestered in puncta when expressed in VU cells? Fig. 2A suggests that NHR-67 is in puncta in all VUs where overexpressed. Would the inactivation of GROUCHO in that background result in extra ACs?

      Response 2.10: Indeed, we would expect that overexpression of NHR-67 may not normally be sufficient to induce cell fate transformation if the Groucho/TCF complex is still functional. Unfortunately we were unable to achieve strong depletion of UNC-37 and LSY-22 through RNAi, and thus relied on the auxin-inducible protein degradation system. Since we are limited by the number of fluorescent channels we can co-visualize, it would not be feasible to combine a heat-shock inducible transgene, a TIR1 transgene, an AID-tagged protein, and multiple cell fate markers.

      The data are often presented as numbers of animals with increased or decreased expression of a particular marker, but no quantification of expression is provided. For example, in Figure 1E, 32/35 animals are reported to exhibit ectopic expression of LIN-12 in the AC and reduced expression of LAG-2. What is the range of the increase/decrease in LIN-12/LAG-2 expression and how does this compare to natural variation in wild-type? The same concerns apply to Fig. 4D.

      Response 2.11: For resubmission, we have quantified the data shown in Figure 1E and now report expression levels of LIN-12::mNeonGreen and LAG-2::P2A::H2B::mTurquoise2 in Figure 1–figure supplement 2. We have also quantified the data in Figure 4D and now report expression levels of cdh-3p::mCherry::moeABD in Figure 4E. Quantification methods have been added to the Materials and Methods section (lines 612-617).

      The authors explain that it is difficult to study a repressive role for POP-1 as this protein functions in multiple developmental pathways and POP-1 depletion needs to be carefully timed for the data to be interpretable. The authors then go on to use RNAi to deplete POP-1 but do not describe in the methods how they achieve the needed precise temporal control.

      Response 2.12: We did indeed describe methods for the GFP-targeting nanobody, which we expressed under a uterinespecific promoter expressed after AC/VU specification. However, since the penetrance of phenotypes associated with this perturbation was low, we utilized RNA interference. We separated the cell fate specification and cell fate maintenance phenotypes by visualizing AC markers (Fig. 6A), which we would expect to be expressed at equal levels if ACs adopted their fate at the same time (via misspecification). We also paired these with a marker for VU cell fate and co-visualized them over time (Fig. 6B,C).

      The authors also do not report the efficiency of protein depletion by RNAi or Auxin treatment.

      Response 2.13: Auxin-induced depletion of mNeonGreen::AID::LSY-22 resulted in more than 90% decrease in expression (n > 75 uterine cells). The AID-tagged allele for UNC-37 was labeled with BFP, which was barely detectable by our imaging system and photobleached very quickly, so we did not quantify its depletion. However, considering that UNC37 and LSY-22 are both expressed fairly uniform and ubiquitously, and that LSY-22 is expressed at higher levels than UNC-37 at the L3 stage according to WormBase (31.9 FPKM vs. 23.5 FPKM), we would predict that its auxin-induced depletion would be just as potent if not moreso.

      Some of the work presented repeats previously published observations, and it is difficult at times to keep track of what is confirmatory and what is new. For example, this group already published on the enrichment of HLH-2 and NHR-67 in the AC, as well as the positive regulation of NHR-67 by HLH-2 (Medwig-Kinney et al 2020). Additionally, prior papers have already reported the interaction between HLH-2 and the nhr-67 locus.

      Response 2.14: The work presented in this manuscript does not repeat any previously published experiments. When we introduced the endogenously tagged NHR-67 and HLH-2 strains in previous work (Medwig-Kinney & Smith et al., 2020), we quantified expression of these proteins in the AC over time but did not compare expression between the AC and VU cells. Additionally, we previously showed that HLH-2 positively regulates NHR-67 in the AC (Medwig-Kinney & Smith et al., 2020), but never showed this is the case in the VU cells. Considering that this regulatory interaction was not observed in the AC/VU cell precursors, we believe that determining whether these proteins interact in the context of the VU cells was a valid question to address.

      Treen et al. 2021 are cited as prior evidence for the existence of "repressive condensates", however, that study does NOT experimentally demonstrate a function for these structures.

      Response 2.15: By “repressive condensates” we are referring to condensation of proteins known to be transcriptional repressors. While we agree that we were not able to demonstrate transcriptional repression of specific loci, our data showing that perturbation of the Groucho repressors UNC-37 and LSY-22 results in ectopic ACs is consistent with the hypothesis that these proteins repress the default AC fate. We have modified our title and text to more clearly distinguish our interpretations versus speculations.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      This manuscript by Leibinger et al describes their results from testing an interesting hypothesis that microtubule detyrosination inhibits axon regeneration and its inhibitor parthenolide could facilitate axon regeneration and perhaps functional recovery. Overall, the results from in vitro studies are largely well performed. However, the in vivo data are less convincing.

      Interpretation of the findings in this study are limited by several gaps:

      1) It is unclear whether microtubule detyrosination a primary effect of hIL-6 and PTEN deletion or secondary to the increased axon growth?

      This point is based on a misunderstanding, as shown in Fig. 2 by Western blot, that detyrosination was increased after intravitreal injection of AAV2-hIL-6 into optic nerves. These optic nerves were uninjured! This indicates that the increased detyrosination is an effect of the treatment itself and does not occur due to axonal regeneration.

      Why hIL-6 and PTEN nevertheless increase axonal regeneration is because the positive effect on other signaling pathways, such as JAK/STAT3 and mTOR, ultimately predominates. Consequently, we show, for both PTEN ko and hIL-6, that we can further enhance these positive effects by neutralizing the negative aspect of increased detyrosination using DMAPT.

      2) Is there any direct evidence for Akt and/or JAK/Stat3 to promote microtubule detyrosination?

      Regarding the AKT/GSK3 signaling pathway, it has been well described that GSK3 activity leads to phosphorylation of microtubule-associated protein 1B, which results in enhanced tubulin detyrosination (Lucas et al., 1998, Goold et al 1999, Owen and Gordon-Weeks 2003). As shown in our previous and cited work, hIL-6 promotes the activation of AKT, which in turn inhibits GSK3 (Leibinger et al. 2016). In Fig. 2, we have also shown that intravitreal hIL-6 treatment in the optic nerve leads to increased inhibitory phosphorylation of GSK3 at the target site of AKT, and that tubulin detyrosination is increased. The same was also shown for PTEN ko: In a previous publication, we showed that PTEN ko increases AKT activity, inhibiting GSK3 phosphorylation (Leibinger et al. 2019). In Fig. 3 of the actual study, we show that PTEN ko results in enhanced tubulin detyrosination. In conclusion, treatments activating the AKT/GSK3 signaling enhance tubulin detyrosination.

      On the other hand, JAK/STAT3 has no direct effect on detyrosination. This was demonstrated in experiments using the CNTF application, which reportedly activates the JAK/STAT3 pathway without affecting AKT/GSK3 (Leibinger et al, 2009, 2016, 2017).

      In cell culture, we have shown that activation of the JAK/STAT3 pathway by CNTF does not change tubulin detyrosination in neurites (Fig. 1 H, I, M; N). Moreover, DMAPT in RGC’s cell bodies does not affect the phosphorylation of STAT3 and S6, and thus has no measurable effect on JAK/STAT3 or the mTOR pathway.

      3) What is the impact of parthenolide on cell soma of neurons and other cell types?

      Parthenolide and DMAPT show a regenerative effect in the nanomolar range (cell culture) and a bell-shaped concentration-response curve. We show a close correlation between detyrosinated microtubules and regeneration (with and without hIL-6 or PTEN-KO), which is, in our opinion, convincing. Moreover, we would like to address a likely misunderstanding in this comment and provide further clarification. The detyrosination of alpha-tubulin occurs after its attachment to microtubules through the action of the tubulin carboxy peptidase vasohibin 1 and 2 (Vash 1, 2). Consequently, tubulin is already present in the detyrosinated form within existing microtubules, and the administration of DMAPT does not affect these pre-existing microtubules. However, DMAPT does play a crucial role in preventing the detyrosination of newly attached tubulin dimers in the growth cones of developing axons. This explains why we can detect detyrosinated tubulin specifically in those regions and why our immunohistochemical analyses in the cell culture experiments focused solely on axon tips.

      It is important to note that when used at low concentrations, which promote axon growth, DMAPT does not measurably affect detyrosination in other neuronal compartments, such as the RGCs' somata. We might observe a decrease in detyrosination only at much higher concentrations. However, this outcome would be inconsequential to our findings.

      Whether additional effects of DMAPT contribute to improved regeneration is not excluded, although unlikely. If so, their investigation would be beyond the scope of the current paper.

      4) Direct evidence that parthenolide augments PTEN deletion in optic nerve or spinal cord is not provided.

      Our research paper primarily investigates the combination of DMAPT with h-IL-6. We chose to combine DMAPT with hIL-6 because, unlike PTEN-KO, only hIL-6 has been demonstrated to facilitate functional recovery following a complete spinal cord crush injury (Leibinger et al., 2021). Therefore, it is unclear why conducting in vivo experiments with PTEN-KO would be necessary, which cannot be used therapeutically. Since we have shown the beneficial effects of DMAPT on hIL-6 in two different in vivo models (optic nerve and spinal cord) anatomically and functionally, we feel that the repetition of these experiments with PTEN ko, which has no therapeutic implication, would not justify the sacrifice of additional animals. This would contradict the principles of reduction, refinement, and replacement, aiming to minimize the use of animals in our research.

      In contrast, the PTEN experiments primarily serve to support the underlying mechanism and demonstrate that DMAPT generally counteracts the negative effect on MT detyrosination, even in conjunction with other procedures that activate the PI3K/AKT pathway. These findings were mechanistically elucidated through cell culture experiments utilizing immunohistochemial analysis, which the editors highlighted as strengths of our paper.

      5) Serotonergic neurotoxin DHT ablates both regenerating and non-regenerating serotonergic axons, which makes spinal cord findings it difficult to interpret.

      The impact of unregenerated serotonergic axons on stereotypic hind leg movements, as assessed through BMS analysis, appears to be minimal, as demonstrated in our previous study (Leibinger et al., 2021). Specifically, our findings revealed that depleting serotonergic neurons using DHT did not significantly affect the BMS score in uninjured animals (Leibinger et al., 2021). Furthermore, even in the control group comprising animals with spinal cord lesions where anatomical regeneration of the RpST did not occur, the administration of DHT had no discernible effect (Fig. 7 K, L).

      To address this concern, we included the following information in the revised manuscript: "It might be considered plausible that the depletion of non-regenerated serotonergic axons could have contributed to these results. However, we can largely dismiss this possibility, as DHT did not influence the non-regenerated vehicle control group. Additionally, in a previous publication, we have demonstrated that the general depletion of serotonergic neurons in uninjured animals also does not significantly impact open field locomotion, as measured by the BMS score and subscore (Leibinger et al., 2021)."

      6) DMAPT was given by i.p. injection. What happens to microtubule detyrosination in other cells within and outside of CNS?

      This question is the same as raised under point 3. -> response see 3.

      Reviewer #2 (Public Review):

      In the current study, Fischer and colleagues extensively examined the role of parthenolide in inhibiting microtubule detyrosination and making the mechanistic link for the compound to facilitate the role of IL6 and PTEN/KO in promoting neurite outgrowth and axon regeneration. The in vitro and mechanistic work laid the foundation for the authors to reach several key predictions that such detyrosination can be applied for in vivo applications. Thus the authors extended the work to optic nerve regeneration and spinal cord recovery. The in vivo compound that the authors utilized is DMAPT, which plays a synergistic role with existing pro-regeneration therapies, such as Il6 treatment.

      The major strength of the work is the first half of the mechanistic inquiries, where the authors combined cell biology and biochemistry approaches to dissect the mechanistic link from parthenolide to microtube dynamics. The shortcoming is that the in vivo data is limited, and the effects might be considered mild, especially by benchmarking with other established and effective strategies.

      The work is solid and prepares a basis for others to test the role of DMAPT in other settings, especially in the setting of other effective pro-regenerative approaches. With the goal of comprehensive and functional recovery in vivo, the impact of the work and the utilities of the methods remain to be tested broadly in other models in vivo.

      Reviewer #3 (Public Review):

      The primary goal of this paper is to examine microtubule detyrosination as a potential therapeutic target for axon regeneration. Using dimethylamino-parthenolide (DMAPT), this study extensively examines mechanistic links between microtubule detyrosination, interleukin-6 (IL-6), and PTEN in neurite outgrowth in retinal ganglion cells in vitro. These findings provide convincing evidence that parthenolide has a synergistic effect on IL-6- and PTEN-related mechanisms of neurite outgrowth in vitro. The potential efficacy of systemic DMAPT treatment to promote axon regeneration in mouse models of optic nerve crush and spinal cord injury was also examined.

      Strengths

      1) The examination of synergistic activities between parthenolide, hyperIL-6, and PTEN knockout is leveraged not only for potential therapeutic value, but also to validate and delineate mechanism of action.

      2) The in vitro studies, including primary human retinal ganglion cells, utilize a multi-level approach to dissect the mechanistic link from parthenolide to microtubule dynamics.

      3) The studies provide a basis for others to test the role of DMAPT in other settings, particularly in the context of other effective pro-regenerative approaches.

      Weaknesses

      1) In vivo studies are limited to select outcomes of recovery and do not validate or address mechanism of action in vivo.

      Reviewer #1 (Recommendations For The Authors):

      Overall, it doesn't seem like the authors bought into or addressed any issues raised during the previous review. In testing their central hypothesis, a critical experiment was to assess the outcome of PTEN knockout in combination with their novel treatment (parthenolide or DMAPT). Unfortunately, this and other issues have not been addressed in this revision.

      PTEN is not part of our central hypothesis. Our research paper primarily investigates the combination of DMAPT with h-IL-6. We chose to combine DMAPT with hIL-6 because, unlike PTEN-KO, only hIL-6 has been demonstrated to facilitate functional recovery following a complete spinal cord crush injury (Leibinger et al., 2021). Therefore, it is unclear why conducting in vivo experiments with PTEN-KO would be necessary, which cannot be used therapeutically. Since we have shown the beneficial effects of DMAPT on hIL-6 in two different in vivo models (optic nerve and spinal cord) anatomically and functionally, we feel that the repetition of these experiments with PTEN ko, which has no therapeutic implication, would not justify the sacrifice of additional animals. This would contradict the principles of reduction, refinement, and replacement, aiming to minimize the use of animals in our research.

      In contrast, the PTEN experiments primarily serve to support the underlying mechanism and demonstrate that DMAPT generally counteracts the negative effect on MT detyrosination, even in conjunction with other procedures that activate the PI3K/AKT pathway. These findings were mechanistically elucidated through cell culture experiments utilizing immunohistochemial analysis, which the editors highlighted as strengths of our paper.

      Reviewer #2 (Recommendations For The Authors):

      The response and revision provided here did not improve the manuscript - the authors chose to focus on re-organizing the methods but did not provide any new experimental data. Thus my recommendations remain the same as the previous round. In brief, the in vivo evidence was rather weak, especially if no further evidence was offered to respond to these points below.

      To possibly improve the manuscript, the authors could consider enhancing the in vivo parts in the following manner;

      1) possibly detyrosination staining in the optic nerve vertical section - it would be interesting to see how the detyrosination assays may work for regenerating conditions, or as an alternate, the authors may consider retina tissue biochemistry (with & without IL6, with & without DMAPT) repeating the biochemical assays as established Fig 2B –

      The detyrosination of alpha-tubulin occurs after its attachment to microtubules through the action of the tubulin carboxy peptidase vasohibin 1 and 2 (Vash 1, 2). Consequently, tubulin is already present in the detyrosinated form within existing microtubules, and the administration of DMAPT does not affect these pre-existing microtubules. However, DMAPT does play a crucial role in preventing the detyrosination of newly attached tubulin dimers in the growth cones of developing axons. This explains why we can detect detyrosinated tubulin specifically in those regions and why our immunohistochemical analyses in the cell culture experiments focused solely on axon tips.

      It is important to note that when used at low concentrations, which promote axon growth, DMAPT does not measurably affect detyrosination in other neuronal compartments, such as the RGCs' somata. We might observe a decrease in detyrosination only at much higher concentrations. Because of these reasons, we could not clearly identify and stain axon tips in 14 µm thick optic nerve sections.

      2) How do the authors benchmark the DMAPT retreatment in the setting of PTEN (aav2-cre injection for cKO) and /or PTEN/SOCS3/CNTF dKO? Which are the best approaches to promote optic nerve regeneration? Would the authors expect DMAPT retreatment to be synergetic with PTENcKO?

      Based on our previous findings, we anticipate that DMAPT would exhibit a synergistic effect when combined with PTEN ko, as demonstrated in our in vitro studies with cultured neurons. Additionally, synergistic effects between DMAPT and PTEN/SOCS3 dKO +CNTF are possible. While these hypotheses hold promise, our current paper primarily focuses on combining DMAPT with hIL-6, which has consistently shown remarkable efficacy as a standalone treatment in optic nerve regeneration.

      3) Regarding the DMAPT treatment, one notable issue was that the RGC survival subject to ONC was very poor, which may limit the effects of DMAPT daily injection. The authors may consider further combining DMAPT with the DLK/LZK inhibitors to examine the synergistic effects.

      As DMAPT itself is not neuroprotective and does not affect retinal ganglion cells' (RGCs) regenerative state by inducing the expression of regeneration-associated genes, a combination with a neuroprotective and regenerative treatment would show stronger effects. This is exactly what we found when combining DMAPT with neuroprotective hIL-6 (Leibinger et al. 2016) in the current paper.

      Moreover, in the raphespinal tract, where respective neurons do not undergo apoptotic cell death after axotomy, the DMAPT effect on anatomic axon regeneration was stronger than in the optic nerve, even without combination with hIL-6, with some axons reaching distances of up to 7 mm distal to the lesion. So, DMAPT can induce long-distance regeneration in neuronal populations unaffected by cell death. Therefore, additional experiments with DLK/LZK inhibitors, as suggested by this reviewer, would not provide an additional benefit to our paper and would not justify the additional sacrifice of animal lives.

      4) Overall, the phenotypes in Figs 5-8 were rather weak after DMAPT treatment, which are universal challenges to spinal cord regeneration. The authors may present this section of the data with further clarification on the selection standards in the methods, such as how the animals and treatment were selected and how a double-blinded experimental design may help further evaluate the effects of DMAPT treatment. I found little relevant information in the current manuscript.

      In the anatomic and functional regeneration analysis presented in Fig. 5-8, we only included animals with a BMS score of 0 one day after the spinal cord crush, indicating a complete absence of hind leg movement. Furthermore, we employed immunohistochemical staining to ensure that no serotonergic axons were detected at 8-10 mm from the lesion site in any of the animals, thus confirming the thoroughness of the lesion (Supplementary Fig. 4). Both the evaluation of the BMS score and the assessment of anatomical regeneration was conducted in a double-blinded manner, ensuring unbiased and objective observations. To address this concern, we will add the following paragraph in the M&M part:

      “Blinding procedure for in vivo experiments Before the start of the experiment, individual vials containing DMAPT or vehicle (DMSO) stock solution were prepared for each particular experimental animal. The vials were randomized by a person who was neither involved in the implementation nor in the evaluation of the experiments. These numbers were randomly distributed to mice of the same age and sex in different cages. This was carried out independently by another person who was neither involved in the data evaluation nor the randomization of the samples. This was followed by the execution of the experiments and the evaluation by scientists who were not involved in any of the randomization processes and did not know the identity of the injected samples. After completion of the data collection, values from mice with signs of spared axons were first removed from the data set for reasons of quality assurance. The criteria for this were a BMS-Sore of a maximum of 0-1 on the first day after the lesion and the absence of uninjured serotonergic axons in spinal cord cross-sections >8-10 mm distal to the lesion site. Finally, the data points were assigned to the respective experimental groups by the person who initially blinded the vials.”

      Reviewer #3 (Recommendations For The Authors):

      Addition of supporting data, revision of discussion, and inclusion of references for parthenolide activities improved the manuscript and adequately addressed concerns


      The following is the authors’ response to the original reviews.

      We feel that the use of human RGCs should be considered a highlight and strength of our paper because, as far as we know, our study is the first to utilize human primary cultures of RGCs to confirm the effectiveness of drugs on human cells. Therefore, this might be of interest to colleagues in our field. Moreover, we have added additional data as suppl. Fig. proving that these cells are living RGCs so this concern has been addressed. In addition, we provide further explanations why other activities of DMAPT beyond microtubule detyrosination, such as oxidative stress and NFkB inhibition, are not considered in experimental examinations or in the interpretation of findings. Therefore, we strongly recommend that this point should not be considered a weakness.<br />

      Strengths:

      1) The examination of synergistic activities between parthenolide, hyper-IL-6, and PTEN knockout is leveraged not only for potential therapeutic value, but also to validate and delineate mechanism of action.

      2) The in vitro studies utilize a multi-level approach that combines cell biology and biochemistry approaches to dissect the mechanistic link from parthenolide to microtubule dynamics.

      3) The studies provide a basis for others to test the role of DMAPT in other settings, particularly in the context of other effective pro-regenerative approaches.

      Weaknesses:

      1) In vivo studies are limited to select outcomes of recovery and do not validate or address mechanism of action in vivo.

      2) Known activities of DMAPT beyond microtubule detyrosination, such as oxidative stress, mitochondrial function and NFkB inhibition, are not considered in experimental examinations or in the interpretation of findings.

      Our research indicates that parthenolide exhibits a regenerative effect within a nanomolar range and with a bell-shaped concentration-response curve in culture. Moreover, we demonstrate a close correlation between the inhibition of detyrosinated microtubules and regeneration and consider the effects of hIL-6 or PTEN-KO on detyrosination in mouse and human RGCs. Therefore, we offer a coherent and satisfactory mechanistic explanation for the effects of parthenolide. We, therefore, feel the request to experimentally explore additional, somewhat speculative possibilities is not reasonable or helpful, and this issue should not be considered as a weakness. Moreover, to the best of our knowledge, no evidence suggests profound antioxidative effects of DMAPT or parthenolide within these low-concentration ranges and that these would affect axon regeneration. Antioxidative effects may also not explain the observed bell-shaped curve. Furthermore, we have already considered the effect of NFkappaB in our previous work (Gobrecht et al., 2016) and shown that NFkappaB remains unaffected by low concentrations of parthenolide. Hence, conducting additional experiments addressing oxidative stress or other speculative causes will not strengthen our findings and do not justify the additional sacrifice of animal lives.

      Nevertheless, we added the following sentence in our manuscript to address this issue: “Although we cannot exclude the possibility that other known activities of parthenolide/DMAPT, such as oxidative stress or NF-kB inhibition, could have contributed to the observed effects, this is rather unlikely because such effects have only been reported at much higher micromolar concentrations (Bork et al., 1997; Saadane et al., 2007; Carlisi et al., 2016; Gobrecht et al., 2016).”

      Editorial Comments:

      The reviewers' consensus is that this manuscript, although containing an impressive amount of data, lacks cohesion.

      The mechanistic studies in vitro are of a distinctly different caliber than the in vivo studies. Additional data is needed to demonstrate that the mechanisms delineated in vitro are related to the outcomes in vivo. As is, this reads as a comprehensive in vitro study with premature in vivo data tacked on the end.

      The manuscript should contain the necessary background and contextual information needed to fully understand the work. Clarity of rationale and context for experimental method/design (why one reagent or insult is selected over another), result interpretation (what does this data tell you and not tell you), and implications for results (what does this mean in the context of current knowledge) should be improved throughout.

      Technical:

      1) There is no validation of human RGC cultures. If this data is to remain in the manuscript, proper verification data should be provided to demonstrate that these are indeed RGCs and that they are viable.

      The retinal ganglion cells (RGCs) were identified by applying the same criteria as murine and rat RGCs,encompassing morphological and immunohistochemical criteria. The staining of a piece of human retina (see Author response image 1) shows βIII-tubulin-positive cells in the ganglion cell layer and forming axonal bundles in the fiber layer. These are RGCs, and it is confirmed that the βIII-tubulin antibody stains human RGCs (Author response image 1A). In addition, the somata of these human RGCs in the retina have a similar diameter (somewhat larger than murine RGCs Author response image 1A, B) to the cultured βIII-tubulin-positive cells (RGCs) and a similar morphology. Finally, these regenerating neurons are GAP43-positive, a regeneration-associated protein shown in Author response image 1C. Thus, these data prove that the cultured cells were human RGCs. These data were included as a suppl. Fig. 1.

      The viability of the neurons was confirmed, as evidenced by their ability to grow neurites - a clear indication of their vitality. We also verified the viability by calceinstaining.

      As far as we know, our study is the first to utilize human primary cultures of RGCs to confirm the effectiveness of CNTF and parthenolide on human cells. Therefore, we would have expected this accomplishment to be emphasized as a strength of our paper.

      Author response image 1.

      A) Retinal flat mounts from human (left) and mouse (right) stained for βIII-tubulin. Scale bar: 50 μm. B) Human (left) and mouse (right) RGCs cultured for 4 days and stained for βIII-tubulin. Scale bar: 25 μm. C) Human βIIItubulin-positive RGCs with regenerating neurites are also GAP43-positive. Scale bar: 50 μm

      2) For graphs depicting means and errors, it is advised that the authors evaluate their use of SEM. Standard deviation should be used when illustrating the distribution of measurements/individuals within a population. Standard error should be used for determining accuracy of the calculated mean, i.e. how close are individuals to the calculated mean? Since standard error is a measure of accuracy rather than distribution, it moves towards zero as the population size increases, regardless of the distribution. Thus, error bars intended to show the range of an effect (i.e. how much functional recovery with treatment?), should be depicted as standard deviation, which illustrates the actual range of data.

      To provide best possible transparency we incorporated each individual data point within our graphs, thus offering a detailed depiction of the complete range of effects. We firmly believe that this approach provides enhanced clarity compared to a standard deviation and grants a more comprehensive understanding of the data. It is worth noting that also presenting the standard error adds supplementary information regarding the accuracy of the calculated mean.

      Thus, we firmly stand by our chosen method of data presentation, as we believe it furnishes readers with more valuable insights. However, if there are additional compelling arguments to display the standard deviation instead of the standard error, we are more than willing to consider them.

      3) One notable issue was that the RGC survival subject to ONC was very poor, which may limit the effects of DMAPT daily injection. The authors may consider further combining DMAPT with the DLK/LZK inhibitors to examine the synergistic effects.

      As DMAPT itself is not neuroprotective and does not affect retinal ganglion cells' (RGCs) regenerative state by inducing the expression of regeneration-associated genes, a combination with a neuroprotective and regenerative treatment would show stronger effects. This is exactly what we found when combining DMAPT with neuroprotective hIL-6 (Leibinger et al. 2016) in the current paper.

      Moreover, in the raphespinal tract, where respective neurons do not undergo apoptotic cell death after axotomy, the DMAPT effect on anatomic axon regeneration was stronger than in the optic nerve, even without combination with hIL-6, with some axons reaching distances of up to 7 mm distal to the lesion. So, DMAPT can induce long-distance regeneration in neuronal populations unaffected by cell death. Therefore, we feel that additional experiments with DLK/LZK inhibitors, as suggested by this reviewer, would not provide an additional benefit to our paper and not justify the additional sacrifice of animal lives.

      To address this issue, we added the following paragraph: “Expectedly, DMAPT was not able to protect RGCs from axotomy-induced cell death (Fig. 4 F, G) since it does solely accelerate microtubule polymerization in axonal growth cones without affecting neuroprotective signaling pathways in the cell body (Fig. 1 F, G; supplementary Fig. 2). We then repeated these experiments in combination with intravitreally applied AAV2hIL-6 which reportedly has a significant neuroprotective effect (Leibinger et al., 2016) (Fig. 4 H).”

      4) Serotonergic neurotoxin DHT, which in the spinal cord injury model ablates both regenerating and nonregenerating serotonergic axons, which makes interpretation of the results difficult. This should be addressed directly in interpretation and discussion.

      The impact of unregenerated serotonergic axons on stereotypic hind leg movements, as assessed through BMS analysis, appears to be minimal, as demonstrated in our previous study (Leibinger et al., 2021). Specifically, our findings revealed that depleting serotonergic neurons using DHT did not significantly affect the BMS score in uninjured animals (Leibinger et al., 2021). Furthermore, even in the control group comprising animals with spinal cord lesions where anatomical regeneration of the RpST did not occur, the administration of DHT had no discernible effect (Fig. 7 K, L).

      To address this concern, we propose including the following information in the revised manuscript: "It might appear conceivable that the depletion of non-regenerated serotonergic axons may have contributed to these results. However, we can rule this out since DHT did not influence the non-regenerated vehicle control group. Furthermore, we have shown in a previous publication that the general depletion of serotonergic neurons in uninjured animals also has no significant influence on openfield locomotion as measured in the BMS score and subscore (Leibinger et al., 2021). Furthermore, we have shown in a previous publication that the general depletion of serotonergic neurons in uninjured animals also has no significant influence on openfield locomotion as measured in the BMS score and subscore (Leibinger et al., 2021).”

      5). Overall, the phenotypes in Figs 5-8 were rather weak after DMAPT treatment, which are universal challenges to spinal cord regeneration. The authors may present this section of the data with further clarification on the selection standards in the methods, such as how the animals and treatment were selected and how a double-blinded experimental design may help further evaluate the effects of DMAPT treatment. I found little relevant information in the current manuscript.

      In the anatomic and functional regeneration analysis presented in Figures 5-8, we only included animals with a BMS score of 0 one day after the spinal cord crush, indicating a complete absence of hind leg movement. Furthermore, we employed immunohistochemical staining to ensure that no serotonergic axons were detected at 8-10 mm from the lesion site in any of the animals, thus confirming the thoroughness of the lesion (Supplementary Fig. 4). Both the evaluation of the BMS score and the assessment of anatomical regeneration was conducted in a doubleblinded manner, ensuring unbiased and objective observations. To address this concern, we will add the following paragraph in the M&M part:

      “Blinding procedure for in vivo experiments Before the start of the experiment, individual vials containing DMAPT or vehicle (DMSO) stock solution were prepared for each experimental animal. The vials were randomized by a person who was neither involved in the implementation nor evaluated the experiments. These numbers were randomly distributed to mice of the same age and sex in different cages. This was carried out independently by another person who was neither involved in the data evaluation nor the randomization of the samples. This was followed by the execution of the experiments and the evaluation by scientists who were not involved in any randomization processes and did not know the identity of the injected samples. After completion of the data collection, values from mice with signs of spared axons were first removed from the data set for quality assurance. The criteria for this were a BMS Sore of a maximum of 0-1 on the first day after the lesion and the absence of uninjured serotonergic axons in spinal cord cross-sections >9-10 mm distal to the lesion site. Finally, the data points were assigned to the respective experimental groups by the person who initially blinded the vials.”

      6) Several supplemental figures are discussed as critical elements of the studies performed. The authors are encouraged to include figures discussed as primary data as primary figures in the manuscript and provide the necessary information regarding experimental design and methods, including "n".

      Thank you for the suggestion.

      7) While the "n" is clear for some subsets of figures (as noted in the rebuttal), it is not clear for all outcomes/figure subsets. For example, it appears that some outcomes were performed in only a subset of the total experimental population and not in the context of statistically significant result. A good example of this is the figure for in vivo suboptimal dosing. The experimental design suggests n=7-10, but the group considered suboptimal due to statistical insignificance is listed as n=4. Is this an entirely separate cohort? If so, is n=4 sufficient and was it considered statistically in the context of the higher-powered cohorts? The lack of clarity regarding experimental design should be addressed.

      To ensure transparency we have provided all n-numbers for each outcome and figure subset. Additionally, the precise n-numbers can be inferred by observing the number of individual points depicted in the graphs. All statistical data are appropriately indicated in the figure legends for reference.

      The data presented in suppl. Fig. 3 represents a preliminary experiment to find effective doses of DMAPT in vivo. In this initial phase, we tested three different doses of DMAPT (0.2, 2, 20 µg/kg) in a reduced group size of only four animals per group. This reduction in animal numbers aligns with the principles to determine reduction, refinement, and replacement, aiming to minimize the use of animals in our research. Subsequently, the group demonstrating the most robust effect (2 µg/kg) was expanded by including additional animals to meet the a priori calculated sample size and validate the results. These additional animal data are presented in Figure 4 A-C. In the case of suppl. Fig. 3 A, B the statistical analysis indicated a significant effect in A using an n=4. As a result, there was no need to utilize additional animals for this particular experiment.

      Gaps:

      1) By in vitro studies, the authors showed that hIL-6 treatment or PTEN knockout elevated microtubule detyrosination. But when does this occur? In another words, is this a primary effect of these treatments or secondary to the increased axon growth? How does this fit with the observations that these interventions promote axon regeneration both in vitro and in vivo?

      This point also seems to be based on a misunderstanding, as shown in Figure 2 by Western blot, that detyrosination was increased after intravitreal injection of AAV2-hIL-6 into optic nerves. These optic nerves were uninjured! This indicates that the increased detyrosination is an effect of the treatment itself and does not occur due to axonal regeneration.

      Why hIL-6 and PTEN nevertheless increase axonal regeneration is because the positive effect on other signaling pathways, such as JAK/STAT3 and mTOR, ultimately predominates. Consequently, we show, for both PTEN ko and hIL-6, that we can further enhance these positive effects by neutralizing the negative aspect of increased detyrosination using DMAPT.

      2) Is there any direct evidence for Akt and/or JAK/Stat3 to promote microtubule detyrosination?

      As described in our previous and cited work, hIL-6, in contrast to CNTF, promotes the activation of AKT (Leibinger et al. 2016). In Fig. 2, we have also shown that intravitreal hIL-6 treatment in the optic nerve leads to increased phosphorylation of GSK3, a substrate of AKT, and that tubulin detyrosination is increased.

      As far as we know, JAK/STAT3 has no direct effect on detyrosination.

      In cell culture, we have shown that activation of the JAK/STAT3 pathway by CNTF application does not change tubulin detyrosination in neurites (Fig. 1 H, I, M; N).

      DMAPT in RGC’s cell bodies does not affect the phosphorylation of STAT3 and S6, and thus has no measurable effect on JAK/STAT3 or the mTOR pathway. Moreover, tubulin detyrosination in neuronal cell bodies is not affected by DMAPT.

      3) Empirical data linking in vivo regeneration with mechanisms delineated in in vitro studies is limited. The addition of such data (i.e. biochemical assays, relevant histology) would better enable interpretation of in vivo studies and improve cohesiveness of the work as a whole.

      The mechanistic links between hIL-6 /PTEN-signaling and tubulin detyrosination and the abrogation of the adverse effects by DMAPT have been extensively addressed in vitro, which has been positively highlighted here in several places. Indeed, the in vivo data were intended to mainly confirm that the mechanisms elaborated in vitro are relevant to axonal regeneration and functional restoration in vivo. Most importantly our data demonstrate that systemic DMAPT application promotes axon regeneration in the CNS and improves functional recovery after a complete spinal cord injury. Form a clinical point of view this is important.

      4) DMAPT activities are not limited to microtubule detyrosination. These alternate activities should be considered, particularly in in vivo studies. Empirical evidence of the potential impact for these mechanisms in the retina, optic nerve, and systemically is strongly encouraged. In vitro studies or studies of a specific neuronal population are insufficient to extrapolate activities in an intact system.

      Parthenolide and DMAPT show a regenerative effect in the nanomolar range (cell culture) and a bell-shaped concentration-response curve. We show a close correlation between detyrosinated microtubules and regeneration (with and without hIL6 or PTEN-KO), which is, in our opinion, convincing. Whether additional effects of DMAPT contribute to improved regeneration is not excluded, although unlikely. If so, their investigation would be beyond the scope of the current paper.

      5) How do the authors benchmark the DMAPT retreatment in the setting of PTEN (aav2-cre injection for cKO) and /or PTEN/SOCS3/CNTF dKO? Which are the best approaches to promote optic nerve regeneration? Would the authors expect DMAPT retreatment to be synergetic with PTENcKO?

      Based on our previous findings, we anticipate that DMAPT would exhibit a synergistic effect when combined with PTEN ko, as demonstrated in our in vitro studies with cultured neurons. Additionally, synergistic effects between DMAPT and PTEN/SOCS3 dKO +CNTF are possible. While these hypotheses hold promise, our current paper primarily focuses on combining DMAPT with hIL-6, which has consistently shown remarkable efficacy as a standalone treatment in optic nerve regeneration.

      Furthermore, our rationale for combining DMAPT with hIL-6 rather than PTEN-KO stems from the fact that, unlike PTEN-KO, hIL-6 has been proven to enable functional recovery following complete spinal cord crush injuries (Leibinger et al., 2021).

      6) A cohesive discussion of findings would be beneficial. What can and cannot be elucidated from in vitro and in vivo studies? How does the in vivo effect compare to existing strategies? What are the limitations of the studies performed? Are there alternative explanations for the findings in vitro or in vivo?

      We appreciate these suggestions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank you for considering the above manuscript for publication in eLife and for sending it for review. We would like to thank the editors and reviewers for taking the time to read our manuscript and for their expert comments. These comments have been helpful and have improved our manuscript. We would like to address the following comments:

      eLife assessment

      This valuable study advances our knowledge of the effects of anxiety/depression treatment on metacognition, demonstrating that treatment increases metacognitive confidence alongside improving symptoms. The authors provide convincing evidence for the state-dependency of metacognitive confidence, based on a large longitudinal treatment dataset. However, it is unclear to what extent this effect is truly specific to treatment, as there was some improvement in metacognitive confidence in the control group.

      Thank you for this assessment of the paper. As the change in confidence was not significant among the control group, the last sentence is not factually correct – could we suggest that it be amended to the following: “However, it is unclear to what extent this effect is truly specific to treatment, as changes in metacognitive bias in the iCBT group were not statistically different from those in the control group.”

      Reviewer #1 (Public Review)

      1) It has been shown previously that there are relationships between a transdiagnostic construct of anxious-depression (AD), and average confidence rating in a perceptual decision task. This study sought to investigate these results, which have been replicated several times but only in cross-sectional studies. This work applies a perceptual decision-making task with confidence ratings and a transdiagnostic psychometric questionnaire battery to participants before and after an iCBT course. The iCBT course reduced AD scores in participants, and their mean confidence ratings increased without a change in performance. Participants with larger AD changes had larger confidence changes. These results were also shown in a separate smaller group receiving antidepressant medication. A similar sized control group with no intervention did not show changes.

      The major strength of the study is the elegant and well-powered data set. Longitudinal data on this scale is very difficult to collect, especially with patient cohorts, so this approach represents an exciting breakthrough. Analysis is straightforward and clearly presented. However, no multiple comparison correction is applied despite many different tests. While in general I am not convinced of the argument in the citation provided to justify this, I think in this case the key results are not borderline (p<0.001) and many of the key effects are replications, so there are not so many novel/exploratory hypothesis and in my opinion the results are convincing and robust as they are. The supplemental material is a comprehensive description of the data set, which is a useful resource.

      The authors achieved their aims, and the results clearly support the conclusion that the AD and mean confidence in a perceptual task covary longitudinally. I think this study provides an important impact to the project of computational psychiatry.Sspecifically, it shows that the relationship between transdiagnostic symptom dimensions and behaviour is meaningful within as well as across individuals.

      We thank the reviewer for their appraisal of our paper and positive feedback on the main manuscript and supplementary information. We agree with the reviewer that the lack of multiple comparison corrections can also justified by key findings being replications and not borderline significance. We have added this additional justification to the manuscript (Methods, Statistical Analyses, page 15, line 568: “Adjustments for multiple comparisons were not conducted for analyses of replicated effects”)

      Reviewer #2 (Public Review)

      The authors of this study investigated the relationship between (under)confidence and the anxious-depressive symptom dimension in a longitudinal intervention design. The aim was to determine whether confidence bias improves in a state-like manner when symptoms improve. The primary focus was on patients receiving internet-based CBT (iCBT; n=649), while secondary aims compared these changes to patients receiving antidepressants (n=82) and a control group (n=88).

      The results support the authors' conclusions, and the authors convincingly demonstrated a weak link between changes in confidence bias and anxious-depressive symptoms (not specific to the intervention arm)

      The major strength and contribution of this study is the use of a longitudinal intervention design, allowing the investigation of how the well-established link between underconfidence and anxious-depressive symptoms changes after treatment. Furthermore, the large sample size of the iCBT group is commendable. The authors employed well-established measures of metacognition and clinical symptoms, used appropriate analyses, and thoroughly examined the specificity of the observed effects.

      However, due to the small effect sizes, the antidepressant and control groups were underpowered, reducing comparability between interventions and the generalizability of the results. The lack of interaction effect with treatment makes it harder to interpret the observed differences in confidence, and practice effects could conceivably account for part of the difference. Finally, it was not completely clear to me why, in the exploratory analyses, the authors looked at the interaction of time and symptom change (and group), since time is already included in the symptom change index.

      We thank the author for their succinct summary of the main results and strengths of our study. We apologise for the confusion in how we described that analysis. We examine state-dependence., i.e. the relationship between symptom change and metacognition change, in two ways in the paper – perhaps somewhat redundantly. (1) By correlating change indices for both measures (e.g. as plotted in Figure 3D) and (2) by doing a very similar regression-based repeated-measures analysis, i.e. mean confidence ~ time * anxious-depression score change. Where mean confidence is entered with two datapoints – one for pre- and one for post-treatment (i.e. within-person) and anxious-depression change is a single value per person (between-person change score). This allowed us to test if those with the biggest change in depression had a larger effect of time on confidence. This has been added to the paper for clarification (Methods, Statistical Analysis, page 14, line 553-559: “To determine the association between change in confidence and change in anxious-depression, we used (1) Pearson correlation analysis to correlate change indices for both measures and, (2) regression-based repeated-measures analysis: mean confidence ~ time * anxious-depression score change, where mean confidence is entered with two datapoints (one for pre- and one for post-treatment i.e., within-person) and anxious-depression change is a single value per person (between-person change score)”).

      The analyses have also been reported as regression in the Results for consistency (Treatment Findings: iCBT, page 5, line 197-204: ‘To test if changes in confidence from baseline to follow-up scaled with changes in anxious-depression, we ran a repeated measure regression analyses with per-person changes in anxious-depression as an additional independent variable. We found this was the case, evidenced by a significant interaction effect of time and change in anxious-depression on confidence (=-0.12, SE=0.04, p=0.002)… This was similarly evident in a simple correlation between change in confidence and change in anxious-depression (r(647)=-0.12, p=0.002)”).

      2) This longitudinal study informs the field of metacognition in mental health about the changeability of biases in confidence. It advances our understanding of the link between anxiety-depression and underconfidence consistently found in cross-sectional studies. The small effects, however, call the clinical relevance of the findings into question. I would have found it useful to read more in the discussion about the implications of the findings (e.g., why is it important to know that the confidence bias is state-dependent; given the effect size of the association between changes in confidence and symptoms, is the state-trait dichotomy the right framework for interpreting these results; suggestions for follow-up studies to better understand the association).

      Thank you for this comment. We have elaborated on the implications of our findings in the Discussion, including the relevance of the state-trait dichotomy to future research and how more intensive, repeated testing may inform our understanding of the state-like nature of metacognition (Discussion, Limitations and Future Directions, page 10, line 378-380: “More intensive, repeating testing in future studies may also reveal the temporal window at which metacognition has the propensity to change, which could be more momentary in nature.”).

      Reviewer #3 (Public Review):

      1) This study reports data collected across time and treatment modalities (internet CBT (iCBT), pharmacological intervention, and control), with a particularly large sample in the iCBT group. This study addresses the question of whether metacognitive confidence is related to mental health symptoms in a trait-like manner, or whether it shows state-dependency. The authors report an increase in metacognitive confidence as anxious-depression symptoms improve with iCBT (and the extent to which confidence increases is related to the magnitude of symptom improvement), a finding that is largely mirrored in those who receive antidepressants (without the correlation between symptom change and confidence change). I think these findings are exciting because they directly relate to one of the big assumptions when relating cognition to mental health - are we measuring something that changes with treatment (is malleable), so might be mechanistically relevant, or even useful as a biomarker?

      This work is also useful in that it replicates a finding of heightened confidence in those with compulsivity, and lowered confidence in those with elevated anxious-depression.

      One caveat to the interest of this work is that it doesn't allow any causal conclusions to be drawn, and only measures two timepoints, so it's hard to tell if changes in confidence might drive treatment effects (but this would be another study). The authors do mention this in the limitations section of the paper.

      Another caveat is the small sample in the antidepressant group.

      Some thoughts I had whilst reading this paper: to what extent should we be confident that the changes are not purely due to practice? I appreciate there is a relationship between improvement in symptoms and confidence in the iCBT group, but this doesn't completely rule out a practice effect (for instance, you can imagine a scenario in which those whose symptoms have improved are more likely to benefit from previously having practiced the task).

      We thank the reviewer for commenting on the implications of our findings and we agree with the caveats listed. We thank the reviewer for raising this point about practice effects. A key thing to note is that this task does not have a learning element with respect to the core perceptual judgement (i.e., accuracy), which is the target of the confidence judgment itself. While there is a possibility of increased familiarity with the task instructions and procedures with repeated testing, the task is designed to adjust the difficulty to account of any improvements, so accuracy is stable. We see that we may not have made this clear in some of our language around accuracy vs. perceptual difficulty and have edited the Results to make this distinction clearer (Treatment Findings: iCBT, pages 4-5, lines 184-189: “Although overall accuracy remained stable due to the staircasing procedure, participants’ ability to detect differences between the visual stimuli improved. This was reflected as the overall increase in task difficulty to maintain the accuracy rates from baseline (dot difference: M=41.82, SD=11.61) to follow-up (dot difference: M=39.80, SD=12.62), (=-2.02, SE=0.44, p<0.001, r2=0.01)”.)

      However, it is true that there can be a ‘practice’ effect in the sense that one may feel more confident (despite the same accuracy level) due to familiarity with a task. One reason we do not subscribe to the proposed explanation for the link between anxious-depression change and confidence change is that the other major aspect of behaviour that improved with practice did so in a manner unrelated to clinical change. As noted above in the quoted text, participants’ discrimination improved from baseline to follow-up, reflected in the need for higher difficulty level to maintain accuracy around 70%. Crucially, this was not associated with symptom change. This speaks against a general mechanism where symptom improvement leads to increased practice effects in general. Only changes in confidence specifically are associated with improved symptoms. We have provided more detail on this in the Discussion (page 9, lines 324-326: “This association with clinical improvements was specific to metacognitive changes, and not changes in task performance, suggesting that changes in confidence do not merely reflect greater task familiarity at follow-up.”).

      2) Relatedly, to what extent is there a role for general task engagement in these findings? The paper might be strengthened by some kind of control analysis, perhaps using (as a proxy for engagement) the data collected about those who missed catch questions in the questionnaires.

      Thank you for your comment. We included the details of data quality checks in the Supplement. Given the small number of participants that failed more than one attention checks (1% of the iCBT arm) and that all those participants passed the task exclusion criteria, we made the decision to retain these individuals for analyses. We have since examined if excluding these small number of individuals impacts our findings. Excluding those that failed more than one catch item did not affect the significance of results, which has now been added to the Supplementary Information (Data Quality Checks: Task and Clinical Scales, page 5, lines 181-185: “Additionally, excluding those that failed more than one catch item in the iCBT arm did not affect the significance of results, including the change in confidence (=0.16, SE=0.02, p<0.001), change in anxious-depression (=-0.32, SE=0.03, p<0.001), and the association between change in confidence and change in anxious-depression (r(638)=-0.10, p=0.011)”).

      3) I was also unclear what the findings about task difficulty might mean. Are confidence changes purely secondary to improvements in task performance generally - so confidence might not actually be 'interesting' as a construct in itself? The authors could have commented more on this issue in the discussion.

      Thank you for this comment and sorry it was not clear in the original paper. As we discussed in a prior reply, accuracy – i.e. proportion of correct selections (the target of confidence judgements) are different from the difficulty of the dot discrimination task that each person receives on a given trial. We had provided more details on task difficulty in the Supplement. Accuracy was tightly controlled in this task using a ‘two-down one-up’ staircase procedure, in which equally sized changes in dot difference occurred after each incorrect response and after two consecutive correct responses. The task is more difficult when the dot difference between stimuli is lower, and less difficult when the dot difference between stimuli is greater. Therefore, task difficulty refers to the average dot difference between stimuli across trials. Crucially, task accuracy did not change from baseline to follow-up, only task difficulty. Moreover, changes in task difficulty were not associated with changes in anxious-depression, while changes in confidence were, indicating confidence is the clinically relevance construct for change in symptoms.

      We appreciate that this may not have been clear from the description in the main manuscript, and have added more detail on task difficulty to the Methods (Metacognition Task, page 14, lines 540-542: “Task difficulty was measured as the mean dot difference across trials, where more difficult trials had a lower dot difference between stimuli.”) and Results (Treatment Findings: iCBT, pages 4-5, lines 184-186: “Although overall accuracy remained stable due to the staircasing procedure, participants’ ability to detect differences between the visual stimuli improved.”). We have also elaborated more on how improvements in symptoms are associated with change in confidence, not task performance in the Discussion (page 9, lines 324-326: “This association with clinical improvements was specific to metacognitive changes, and not changes in task performance, suggesting that changes in confidence do not merely reflect greater task familiarity at follow-up”).

      4) To make code more reproducible, the authors could have produced an R notebook that could be opened in the browser without someone downloading the data, so they could get a sense of the analyses without fully reproducing them.

      Thank you for your comment. We appreciate that an R notebook would be even better than how we currently share the data and code. While we will consider using Notebooks in future, we checked and converting our existing R script library into R Notebooks would require a considerable amount of reconfiguration that we cannot devote the time to right now. We hope that nonetheless the commitment to open science is clear in the extensive code base, commenting and data access we are making available to readers.

      5) Rather than reporting full study details in another publication I would have found it useful if all relevant information was included in a supplement (though it seems much of it is). This avoids situations where the other publication is inaccessible (due to different access regimes) and minimises barriers for people to fully understand the reported data.

      We agree this is good practice – the Precision in Psychiatry study is very large, with many irrelevant components with respect to the present study (Lee et al., BMC Psychiatry, 2023). For this reason, we tried to provide all that was necessary and only refer to the Precision in Psychiatry study methods for fine-grained detail. Upon review, the only thing we think we omitted that is relevant is information on ethical approval in the manuscript, which we have now added (Methods, Participants, page 11, lines 412-417: “Further details of the PIP study procedures that are not specific to this study can be found in a prior publication (21). Ethical approval for the PIP study was obtained from the Research Ethics Committee of School of Psychology, Trinity College Dublin and the Northwest-Greater Manchester West Research Ethics Committee of the National Health Service, Health Research Authority and Health and Care Research Wales”). If any further information is lacking, we are happy to include it here also.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      The first line of the abstract refers to "metacognitive impairments", but the key result is a difference in the mean confidence rating - i.e. could be how participants are using the scale. It's not clear to me that lower mean confidence is necessarily an "impairment" (what's the "right" level of confidence 1-6 for a performance of 70% accuracy). The first line of discussion uses "metacognitive biases" which seems a more accurate description.

      We agree that the term bias is more appropriate to use in the Abstract, given that there is not set level to indicate any level of ‘impairment’ associated with under- or over-confidence. This has been changed to ‘biases’ as per the reviewer’s request (Abstract, page 2, line 49). Thank you for this suggestion.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest being more cautious in the wording relating to the simple effect tests on changes across different treatment arms in the abstract - since no interaction was found it may suggest a difference between arms that is not found significantly. Also since comparison between arms was the secondary aim, first describe interaction effects before simple effects in results.

      Thank you for this suggestion, we agree that the lack of significant interaction effect of time and group on confidence is a key finding, which has now been included in the Abstract (page 2, lines 67-71). Additionally, we have rearranged the order of results so the interaction effects precede the simple effects (Results, Comparing iCBT, Antidepressant and Control Groups, page 7, lines 246 – 292:

      "When comparing the three groups directly, ANOVA analysis predicting anxious-depression scores with group and time as independent variables revealed a main effect of time (F(1, 1632)=62.99, p<0.001), a main effect of group (F(2, 1632)=249.74, p<0.001), and an interaction effect of group and time (F(2, 1632)=9.23, p<0.001). Examining simple effects in the antidepressant arm, there was a significant reduction in anxious-depression from baseline to follow-up (=-0.61, SE=0.09, p<0.001). Among controls, levels of anxious-depression did not significantly change (=0.10, SE=0.06, p=0.096). Further details of transdiagnostic clinical changes for the antidepressant and controls groups are presented in Figure 4A and Table S4.

      Predicting confidence scores using ANOVA analysis with group and time as independent variables revealed a main effect of time (F(1, 1632)=16.26, p<0.001), and no significant main effect of group (F(2, 1632)=2.35, p=0.096). The interaction effect of group and time on mean confidence was not significant (F(2, 1632)=0.60, p=0.550), suggesting that change in confidence did not differ across the three groups. Tests of simple effects revealed that mean confidence significantly increased from baseline (M=3.77, SD=0.88) to follow-up (M=4.07, SD=0.79) in the antidepressant arm (=0.31, SE=0.08, p<0.001) (Figure 4B). Among controls, there was no significant change in confidence from baseline (M=3.68, SD=0.86) to follow-up (M=3.79, SD=0.92) (=0.11, SE=0.07, p=0.103) (Figure 4B).

      With respect to task performance, there was a significant main effect of time (F(1, 1632)=15.17, p=0.001) and group (F(2, 1632)=4.56, p=0.011) on mean dot difference when the three groups were included in the model. The interaction effect of time and group on mean dot difference was not significant (F(2, 1632)=1.91, p=0.148), suggesting no differences across the groups in task difficulty changes. In the antidepressant arm, mean dot difference decreased from baseline (M=41.2, SD=13.3) to follow-up (M=35.3, SD=13.1) (=-5.91, SE=1.25, p<0.001), indicating increased task difficulty. There was no significant change in task difficulty among controls from baseline (M=43.0, SD=11.8) to follow-up (M=41.4, SD=13.6) (=-1.64, SE=1.30, p=0.210) (Figure 4C).

      While our sample was underpowered to examine individual differences, we conducted an exploratory analysis examining the connection between changes in anxious-depression symptoms and changes in confidence in the antidepressant and controls groups. When examining the effects of time, group and anxious-depression change on mean confidence, there was a significant interaction effect of time and anxious-depression change on mean confidence (F(1, 1626)=4.04, p=0.045), suggesting change in confidence is associated with change in anxious-depression. There was no significant three-way interaction of anxious-depression change, time and group on mean confidence when comparing the three groups (F(2, 1626)=0.08, p=0.928), indicating that the significant association between confidence change and anxious-depression change was not specific to any group. Although not significant, the association between change in confidence and change in anxious-depression was in the expected negative direction in the antidepressant arm (r(80)=-0.10, p=0.381), and among controls (r(86)=-0.17, p=0.111) (Figure 4D)."

      Reviewer #3 (Recommendations For The Authors):

      Some minor points:

      Intro

      1) Awkward wording on page 3: 'but little research on how it might impact on metacognition'

      We have amended this sentence to make it more clear that relatively less research has been conducted on metacognitive changes following iCBT. We have also provided more detail on a prior study that examined changes in metacognitive beliefs with iCBT, and how this differs from the current study (Introduction, page 3, lines 137-141: “Additionally, iCBT has demonstrated clinical effectiveness in terms of symptom improvement (22–24). While one study found that iCBT modified self-reported metacognitive beliefs (25), it remains unknown if metacognitive confidence in decision-making improves following successful iCBT”).

      2) On page 3 the authors note 'but studies typically lacked power to detect effects of antidepressants on cognitive abilities (30-33)' - however, surely this is a problem with this study too, and its relatively small sample of those taking antidepressants?

      Thank you for highlighting this. The power comment was in the reference to the larger iCBT arm in this study, but we can appreciate that its placement means that it could be interpreted as being in relation to our smaller antidepressant arm (which we acknowledge is also potentially underpowered). We have reworded this sentence to make it clearer that prior antidepressant studies have not examined the impact of changes in metacognition specifically (Introduction, page 4, lines 147-149: “However, studies examining the impact of antidepressants on cognition have typically focused on cognitive capacities other than metacognition (30–33)”).

      Results

      3) Fig 2 - please clarify what the error bars indicate.

      The error bars represent the standard error around the standardised beta coefficients, which I have added to the description of Figure 2 (page 4, lines 171-172: “The error bars represent the standard error around the standardised beta coefficient”).

      4) Awkward wording: 'though it went in the same direction (Figure 4B)'.

      This part of the sentence was removed to reduce confusion.

      5) This description of the results is somewhat overstated: 'suggesting change in confidence was dependent on change in anxious-depression' (page 7) - this could also be the other way around, or related to a third factor.

      We have changed this from ‘dependent’ to ‘is associated with’, which accounts for the unknown directionality and true dependency of confidence changes on changes in anxious-depression (Results, page 7, line 285: “…suggesting change in confidence is associated with change in anxious-depression”).

      Methods

      6) Please also show how the WSAS in a supplement.

      Although this comment is unclear, we have provided additional information on how each item of the WSAS was scored and the overall score range (Supplemental methods, page 2, lines 53-55: “Each WSAS item was scored from 0 ‘not at all’ to 8 ‘very severely’, with overall scores ranging from 0 to 40. Higher WSAS scores indicating higher levels of functional impairment (11)”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Please find below our detailed point-by-point response to the eLife reviewer comments. As suggested by the reviewers, we have 1) replaced most of the Bar charts by Box plots, 2) highlighted the sucellular regions that are analyzed in the measurement experiments, and 3) have rewritten and toned down several subsections of the discussion.

      Reviewer #1 (Recommendations For The Authors):

      I suggest that the authors consider the following points in future versions of this manuscript:

      1). The link between the striking plant phenotype and GXM misregulation is unclear since GXM overexpression doesn't alter plant phenotypes or lignin content (Yuan et al 2014 Plant Science), so misregulation of GXMs in msil2msil4 mutants clearly is not the whole story. The authors should discuss alternative interpretations of their results and other possible targets of MSIL2/4 that might be contributing to the plant phenotype.

      We completely agree with the reviewer that the misregulation of GXMs in msil2/4 is not the whole story and we are currently developing specific strategies in order to characterize in an unbiased manner the full repertoire of MSIL mRNA targets in the stem, hoping we can identify other targets relevant to the formation of SCW. We have also toned-down our discussion concerning the possible impact of glucuronoxylan methylation level on lignin deposition (L546-552).

      2) Similarly, it remains unclear why one particular secondary cell wall enzyme is regulated post-transcriptionally, while so much of the pathway is regulated at the transcriptional level. Please discuss.

      We do not exclude that other genes encoding for SCW enzymes are impacted and it will be the subject of further investigations. We have extended the discussion concerning these points. We have extended the discussion concerning these points (L486-498).

      3) Thirdly, it seems that MSIL2 and MSIL4 are expressed in tissues that are not synthesizing secondary cell walls. The authors should discuss other possible targets of MSIL2/4 from their work.

      We have extended the discussion concerning the pleiotropic effects of MSIL mutation in Arabidopsis (L 416-425). The variability of the msil2/4 phenotype is so large that we expect these proteins to regulate various cellular functions through the binding of specific set of mRNA. The mRNA targets specifically involved in these regulations will need to be determined on a case-by-case basis.

      4) The discussion is extremely speculative and introduces new abbreviations (LTAc, XTRe) that are only used in their model (Figure 7). I suggest replacing these with dashed lines and/or question marks in the model, since as currently depicted, it looks as if these could be known gene products, which could be very misleading.

      We have removed the Ltac and XTRe abbreviations in Figure 7, and the corresponding text in the discussion section.

      5) Similarly, the speculation that cellulose content somehow regulates glucuronoxylan levels via xylan-cellulose interactions, leading to degradation of excess glucuronoxylan after synthesis is, to my knowledge, completely unsupported by any evidence except the correlation between cellulose and xylan levels. Please either support this claim with references or remove it from the discussion.

      We have removed the claim and have rewritten and toned down the text accordingly to the reviewer 1 comments (L 499-512).

      6) Bar charts are rarely the most appropriate method for displaying biological data (Streit & Gehlenborg 2014 Nature Methods). Authors should replace bar charts with one of the following options: A) plot all individual datapoints and overlay summary statistics, B) box plots with all individual datapoints show, C) violin plots (when n is large, i.e. n > 50). R and R studio are free software that can generate such plots. Several excellent tools exist online to generate such plots via a free, graphical user interface, such as boxplotr (Spitzer et al 2014 Nature Methods): http://shiny.chemgrid.org/boxplotr/ and PlotsOfData (Postma & Goedhart 2019 PLoS Biology): https://huygens.science.uva.nl/PlotsOfData/

      We have replaced the Bar charts in figure 4E,G and Fig 5E with Box plots and acknowledged the software used in the corresponding Materials and methods section.

      Reviewer #2 (Recommendations For The Authors):

      Minor points:

      Which cells from Fig. 4b were measured for 4c? Some highlighted annotations to delineate the regions that were measured would help.

      We have highlighted in figure 4B the subcellular regions cells analyzed in the measurement experiments.

      In line 254, the phrase "not merely affected" in the mutant should be rephrased for clarity

      We have replaced “not merely affected” by “not significantly” (L274).

      Line 317: "we first performed glycome profiling", the data shows monosaccharide profile, not glycome profiling usually involving antibodies microarrays

      We have corrected the text according to the reviewer comment (L339-340).

      Reviewer #3 (Recommendations For The Authors):

      Altogether, the study shows clear biological relevance of the MSL family of RNA-binding proteins, and provides good arguments that the underlying mechanism is control of mRNAs encoding enzymes involved in secondary cell wall metabolism (although concluding on translational control in the abstract is perhaps saying too much - post-transcriptional control will do given the evidence presented). One observation reported in the study makes it vulnerable to alternative interpretation, however, and I think this should be explicitly treated in the discussion:

      The fact that immune responses are switched on in msl2/4 mutants could also mean that MSL2/4 have biological functions unrelated to cell wall metabolism in wild type plants, and that cell wall defects arise solely as an indirect effect of immune activation (that is known to involve changes in expression of many cell wall-modifying enzymes and components such as pectin methylesterases, xyloglucan endotransglycosylases, arabinogalactan proteins etc. Indeed, the literature is rich in examples of gene functions that have been misinterpreted on the basis of knockout studies because constitutive defense activation mediated by immune receptors was not taken into account (see for example Lolle et al., 2017, Cell Host & Microbe 21, 518-529).

      With the evidence presented here, I am actually close to being convinced that the primary defect of msl2/msl4 mutants is directly related to altered cell wall metabolism, and that defense responses arise as a consequence of that, not the other way round. But I do not think that the reverse scenario can be formally excluded with the evidence at hand, and a discussion listing arguments in favor of the direct effect proposed here would be appropriate. Elements that the authors could consider to include would be the isolation of a cellulose synthase mutant as a constitutive expressor of jasmonic acid responses (cev1) as a clear example that a primary defect in cell wall metabolism can produce defense activation as secondary effect. The interaction of MSL4 with GXM1/3 mRNAs is also helpful to argue for a direct effect, and it would strengthen the argument if more examples of this kind could be included.

      In accordance to Rev3 comments, we have extended the discussion, listing the arguments, that we believe, are not in favor of a primary effect of the MSIL2/4 proteins on the activation of plant defense pathways (L468-485).

      SUGGESTIONS FOR IMPROVED ANALYSES & MINOR TEXT AND FIGURE CORRECTIONS.

      (1) Unless there is a very good reason to use homology modelling such as SWISS-MODEL (for example ligand-bound proteins), Alphafold2 is now the tool to use for structure prediction. I would at least verify that Alphafold agrees with SWISS-MODEL on the predicted structures shown in Fig 2a.

      We have analyzed the MSIL4 sequence using the Alphafold2 prediction software and the output of this analysis completely agrees with the SWISS-Model prediction. We have added an additional panel showing the Alphafold 2 prediction (see figure 2-figure supplement 1B).

      (2) The plant pictures shown in Figure 2d are not publication quality in terms of resolution, mounting, size. They really should be redone before final publication.

      We thank the reviewer for this important observation, and have improved the resolution of the figure 2D.

      (3) The colocalization in Figure 3d/e would benefit from some statistical analysis of the data: How many foci were examined? How many showed colocalization? Is that fraction statistically significant? It can be done from the images at hand; I do not think that additional data acquisition is necessary.

      We have used an ImageJ plugin to perform colocalization analysis on the microscopy images corresponding to the bottom panel of the figure 3D (heat stress). This analysis confirmed that most of the foci are actually colocalizing (see Author response image 1). However our initial image data acquisition do not allow us to perform statistical analysis on it. We have added a sentence indicating that colocalization is supported by an analysis using an ImageJ plugin.

      Author response image 1.

      4) Typographical and other writing errors:

      Line 72 "prior to"

      Line 77 "in the Arabidopsis model"

      Line 97 "RBP-mediated..."

      Line 110 "aspects of development"

      Line 128 "little is known" (no yet)

      Line 253 "Col-0"

      Line 346 "previous"

      All the writing errors have been corrected in the revised version.

    1. Author Response

      We thank the reviewers for their helpful comments and suggestions.

      eLife assessment

      This is an important contribution that extends earlier single-unit work on orientation-specific center-surround interactions to the domain of population responses measured with Voltage Sensitive Dye (VSD) imaging and the first to relate these interactions to orientation-specific perceptual effects of masking. The authors provide convincing evidence of a pattern of results in which the initial effect of the mask seems to run counter to the behavioral effects of the mask, a pattern that reversed in the latter phase of the response. It seems likely that the physiological effects of masking reported here can be attributed to previously described signals from the receptive field surround.

      We thank the reviewers for bringing up the relation of our results to findings from previous orientation-specific center-surround interactions studies. In our revision, we will add a paragraph discussing this important issue. Briefly, for multiple reasons, we believe that the majority of the behavioral and neural masking effects that we observe may be from target-mask interactions at the target location rather than from the effect of the mask in the surround. First, in human subjects, perceptual similarity masking effects are almost entirely accounted for by target-mask interactions at the target location and are recapitulated when the mask has the same size and location as the target (Sebastian et al 2017). Second, in our computational model (Fig. 8), the effect of mask orientation on the dynamics of the response are qualitatively the same if the mask is restricted to the size and location of the target. Third, in our model, our results are qualitatively the same when the spatial pooling region for the normalization signal is the same as that for the excitation signal. These points will be elaborated in the revised manuscript and points 2 and 3 will be demonstrated in a supplementary figure.

      We would also like to point out some key differences between the stimuli that we use and the ones used in most previous center-surround studies. First, in our experiments, the target and the mask were additive, while in most previous center-surround studies the target occludes the background. Such studies therefore restrict the mask effect to the surround, while in our study we allow target-mask interactions at the center. Second, most center-surround studies have a sharp-edged target/surround, while in our experiments no sharp edges were present. Unpublished results form our lab suggest that such sharp edges have a large impact on V1 population responses. We will expand on these issues in the revised manuscript. A third key difference is that our stimuli were flashed for a short interval of 250 ms corresponding to a typical duration of a fixation in natural vision, while most previous center-surround studies used either longer-duration drifting stimuli or very short-duration random-order stimuli for reverse-correlation analysis.

      In addition, we would like to emphasize that our results go beyond previous studies in two important ways. First, we study the effect of similarity masking in behaving animals and quantitatively compare the effect of similarity masking on behavior and physiology in the same subjects and at the same time. Second, VSD imaging allows us to capture the dynamics of superficial V1 population responses over the entire population of millions of neurons activated by the target at two important spatial scales. Such results therefore complement electrophysiological studies that examine the activity of a very small subset of the active neurons.

      Reviewer #1 (Public Review):

      This is a clear account of some interesting work. The experiments and analyses seem well done and the data are useful. It is nice to see that VSDI results square well with those from prior extracellular recordings. But the work may be less original than the authors propose, and their overall framing strikes me as odd. Some additional clarifications could make the contribution more clear.

      Please see our reply above regarding the agreement with previous studies and framing.

      My reading is that this is primarily a study of surround suppression with results that follow pretty directly from what we already know from that literature, and although they engage with some of the literature they do not directly mention surround suppression in the text. Their major effect - what they repeatedly describe as a "paradoxical" result in which the responses initially show a stronger response to matched targets and backgrounds and then reverse - seems to pretty clearly match the expected outcome of a stimulus that initially evokes additional excitation due to increased center contrast followed by slightly delayed surround suppression tuned to the same peak orientation. Their dynamics result seems entirely consistent with previous work, e.g. Henry et al 2020, particularly their Fig. 3 https://elifesciences.org/articles/54264, so it seems like a major oversight to not engage with that work at all, and to explain what exactly is new here.

      We thank the reviewer for the pointing out this previous work which we will cite in the revised version of the manuscript. For the reasons discussed above, while this study is interesting and related to our work, we believe that our results are quite distinct.

      • In the discussion (lines 315-316), they state "in order to account for the reduced neural sensitivity with target-background similarity in the second phase of the response, the divisive normalization signal has to be orientation selective." I wonder whether they observed this in their modeling. That is, how robust were the normalization model results to the values of sigma_e and sigma_n? It would be useful to know how critical their various model parameters were for replicating the experimental effects, rather than just showing that a good account is possible.

      Thank you for this suggestion. In the revised manuscript we will include a supplementary figure that will show how the model’s predictions are affected by the orientation tuning and spatial extent of the normalization signal, and by the size of the mask.

      • The majority of their target/background contrast conditions were collected only in one animal. This is a minor limitation for work of this kind, but it might be an issue for some.

      We agree that this is a limitation of the current study. These are challenging experiments and we were unable to collect all target/background contrast combinations from both monkeys. However, in the common conditions, the results appear similar in the two animals, and the key results seem to be robust to the contrast combination in the animal in which a wider range of contrast combinations was tested. We will add these points to the discussion in the revised manuscript.

      • The authors point out (line 193-195) that "Because the first phase of the response is shorter than the second phase, when V1 response is integrated over both phases, the overall response is positively correlated with the behavioral masking effect." I wonder if this could be explored a bit more at the behavioral level - i.e. does the "similarity masking" they are trying to explain show sensitivity to presentation time?

      We agree that testing the effect of stimulus duration on similarity masking is interesting, but unfortunately, it is beyond the scope of the current study. We would also like to point out that the duration of the presentation was selected to match the typical time of fixation during natural behaviors, so much shorter or much longer stimulus durations would be less relevant for natural vision.

      • From Fig. 3 it looks like the imaging ROI may include some opercular V2. If so, it's plausible that something about the retinotopic or columnar windowing they used in analysis may remove V2 signals, but they don't comment. Maybe they could tell us how they ensured they only included V1?

      We thank the reviewer for this comment. As part of our experiments, we extract a detailed retinotopic map for each chamber, so we were able to ensure that the area used for the decoding analysis lays entirely within V1. We will incorporate this information in the revised manuscript.

      • In the discussion (lines 278-283) they say "The positive correlation between the neural and behavioral masking effects occurred earlier and was more robust at the columnar scale than at the retinotopic scale, suggesting that behavioral performance in our task is dominated by columnar scale signals in the second phase of the response. To the best of our knowledge, this is the first demonstration of such decoupling between V1 responses at the retinotopic and columnar scales, and the first demonstration that columnar scale signals are a better predictor of behavioral performance in a detection task." I am having trouble finding where exactly they demonstrate this in the results. Is this just by comparison of Figs. 4E,K and 5E,K? I may just be missing something here, but the argument needs to be made more clearly since much of their claim to originality rests on it.

      We thank the reviewer for this comment. In the revised manuscript we will be more explicit and refer to the relevant figure panels (Fig 4D, E, J, & K vs. Fig 5D, E, J, & K) and report important values to substantiate this key claim.

      Reviewer #2 (Public Review):

      Summary

      In this experiment, Voltage Sensitive Dye Imaging (VSDI) was used to measure neural activity in macaque primary visual cortex in monkeys trained to detect an oriented grating target that was presented either alone or against an oriented mask. Monkeys' ability to detect the target (indicated by a saccade to its location) was impaired by the mask, with the greatest impairment observed when the mask was matched in orientation to the target, as is also the case in human observers. VSDI signals were examined to test the hypothesis that the target-evoked response would be maximally suppressed by the mask when it matched the orientation of the target. In each recording session, fixation trials were used to map out the spatial response profile and orientation domains that would then be used to decode the responses on detection trials. VSDI signals were analyzed at two different scales: a coarse scale of the retinotopic response to the target and a finer scale of orientation domains within the stimulus-evoked response. Responses were recorded in three conditions: target alone, mask alone, and target presented with mask. Analyses were focused on the target evoked response in the presence of the mask, defined to be the difference in response evoked by the mask with target (target present) versus the mask alone (target absent). These were computed across five 50 msec bins (total, 250 msec, which was the duration of the mask (target present trials, 50% of trials) / mask + target (target present trials, 50% of trials). Analyses revealed that in an initial (transient) phase the target evoked response increased with similarity between target and mask orientation. As the authors note, this is surprising given that this was the condition where the mask maximally impaired detection of the target in behavior. Target evoked responses in a later ('sustained') phase fell off with orientation similarity, consistent with the behavioral effect. When analyzed at the coarser scale the target evoked response, integrated over the full 250 msec period showed a very modest dependence on mask orientation. The same pattern held when the data were analyzed on the finer orientation domain scale, with the effect of the mask in the transient phase running counter to the perceptual effect of the mask and the sustained response correlating the perceptual effect. The effect of the mask was more pronounced when analyzed at the scale.

      Strengths

      The work is on the whole very strong. The experiments are thoughtfully designed, the data collection methods are good, and the results are interesting. The separate analyses of data at a coarse scale that aggregates across orientation domains and a more local scale of orientation domains is a strength and it is reassuring that the effects at the more localized scale are more clearly related to behavior, as one would hope and expect. The results are strengthened by modeling work shown in Figure 8, which provides a sensible account of the population dynamics. The analyses of the relationship between VSDI data and behavior are well thought out and the apparent paradox of the anti-correlation between VSDI and behavior in the initial period of response, followed by a positive correlation in the sustained response period is intriguing.

      Points to Consider / Possible Improvements

      The biphasic nature of the relationship between neural and behavioral modulation by the mask and the surprising finding that the two are anticorrelated in the initial phase are left as a mystery. The paper would be more impactful if this mystery could be resolved.

      We thank the reviewer for the positive comments. In our view, while our results are surprising, there may not be a remaining mystery that needs to be resolved. As our model shows, the biphasic nature of V1’s response can be explained by a delayed orientation-tuned gain control. Our results are consistent with the hypothesis that perception is based on columnar-scale V1 signals that are integrated over an approximately 200 ms long period that incorporates both the early and the late phase of the response, since such decoded V1 signals are positively correlated with the behavioral similarity masking effect (Fig. 5D, J). We will explain this more clearly in the discussion of our revised manuscript.

      The finding is based on analyses of the correlation between behavior and neural responses. This appears in the main body of the manuscript and is detailed in Figures S1 and S2, which show the correlation over time between behavior and target response for the retinotopic and columnar scale.

      One possible way of thinking of this transition from anti- to positive correlation with behavior is that it might reflect the dynamics of a competitive interaction between mask and target, with the initial phase reflecting predominantly the mask response, with the target emerging, on some trials, in the latter phase. On trials when the mask response is stronger, the probability of the target emerging in the latter phase, and triggering a hit, might be lower, potentially explaining the anticorrelation in the initial phase. The sustained response may be a mixture of trials on which the target response is or is not strong enough to overcome the effect of the mask sufficiently to trigger target detection.

      It would, I think, be worth examining this by testing whether target dynamics may vary, depending on whether the monkey detected the target (hit trials) or failed to detect the target (miss trials). Unless I missed it I do not think this analysis was done. Consistent with this possibility, the authors do note (lines 226-229) that "The trajectories in the target plus mask conditions are more complex. For example, when mask orientation is at +/- 45 deg to the target, the population response is initially dominated by the mask, but then in mid-flight, the population response changes direction and turns toward the direction of the target orientation." This suggests (to this reviewer, at least) that the emergence of a positive correlation between behavioral and neural effects in the latter phase of the response could reflect either a perceptual decision that the target is present or perhaps deployment of attention to the location of the target.

      It may be that this transition reflected detection, in which it might be more likely on hit trials than miss trials. Given the SNR it would presumably be difficult to do this analysis on a trial-by-trial basis, but the hit and miss trials (which make each make up about 1/2 of all trials) could be averaged separately to see if the mid-flight transition is more prominent on hit trials. If this is so for the +/- 45 degree case it would be good to see the same analysis for other combinations of target and mask. It would also be interesting to separate correct reject trials from false alarms, to determine whether the mid-flight transition tends to occur on false alarm trials.

      If these analyses do not reveal the predicted pattern, they might still merit a supplemental figure, for the sake of completeness.

      We thank the reviewer for suggesting this interesting possibility. The analysis in the manuscript was based on both correct and incorrect trials, raising the possibility that our results reflect some contribution from decision- and/or attention-related signals rather than from low-level nonlinear encoding mechanisms in V1 that we postulate in our model (Fig. 8). To explore this possibility, we re-examined our results while excluding error trials. We found that our key results from Figs 4 and 5 – namely that there is an early transient phase in which the neural and behavioral similarity effects are anti-correlated, and a later sustained phase in which they are positively correlated – hold even for the subset of correct trials, reducing the possibility that decision/attention-related signals play a major role in explaning our results. We will include the results of this analysis as a supplementary figure in the revised manuscript. This analysis, however, does seem to reveal interesting differences between correct and incorrect trials which we will discuss in the revised manuscript. s

      References

      Sebastian S, Abrams J, Geisler WS. 2017. Constrained sampling experiments reveal principles of detection in natural scenes. Proc Natl Acad Sci U S A 114: E5731-e40

    1. Author Response

      The following is the authors’ response to the previous review

      In response to the additional concerns voiced by Reviewer# 2, we have conducted control simulations. The outcomes are summarized in the new supplements to Figure 3. They show that the model is robust under changes of short-term plasticity parameters and running speed.

      Below, we give a point-by-point response to the remaining comments of the editors and reviewers.

      Editorial Assessment: This important work presents an interesting perspective for the generation and interpretation of phase precession in the hippocampal formation. Through numerical simulations and comparison to experiments, the study provides a convincing theoretical framework explaining the segregation of sequences reflecting navigation and sequences reflecting internal dynamics in the DG-CA3 loop. This study will be of interest for researchers in the spatial navigation and computational neuroscience fields.

      We would like to thank the Editors very much for this positive assessment of our work!

      Reviewer #1

      In the manuscript entitled ”A theory of hippocampal theta correlations”, the authors propose a new mechanism for phase precession and theta-time scale generation, as well as their interpretation in terms of navigation and neural coding. The authors propose the existence of extrinsic and intrinsic sequences during exploration, which may have complementary functions. These two types of sequences depend on external input and network interactions, but differ on the extent to which they depend on movement direction. Moreover, the authors propose a novel interpretation for intrinsic sequences, namely to signal a landmark cue that is independent of direction of traversal. Finally, a readout neuron can be trained to distinguish extrinsic from intrinsic sequences.

      • The study puts forward novel computational ideas related to neural coding, partly based on previous work from the authors, including published (Leibold, 2020, Yiu et al., 2022) and unpublished (Ahmedi et al., 2022. bioRxiv) work. The manuscript will contribute to the understanding of the mechanisms behind phase precession, as well as to how we interpret hippocampal temporal coding for navigation and memory.

      I am very pleased to have seen major improvements in the manuscript regarding i) a clarification of the concepts of extrinsic and intrinsic mechanisms, and ii) overall arrangement of Figures but also iii) expanding on some important concepts such as the role of experience in determining the asymmetric connectivity that is necessary for intrinsic models of sequence generation.

      We are delighted to have been able to amend the Reviewer’s concerns voiced after the initial submission. We are very grateful for their many good suggestions that allowed us to make important additions to the revised manuscript.

      Reviewer #2

      • Place cells fire sequentially during hippocampal theta oscillations, forming a spatial representation of behavioral experiences in a temporally-compressed manner. The firing sequences during theta cycles are widely considered as essential assemblies for learning, memory, and planning. Many theoretical studies have investigated the mechanism of hippocampal theta firing sequences; however, they are either entirely extrinsic or intrinsic. In other words, they attribute the theta sequences to external sensorimotor drives or focus exclusively on the inherent firing patterns facilitated by the recurrent network architectures. Both types of theories are inadequate for explaining the complexity of the phenomena, particularly considering the observations in a previous paper by the authors: theta sequences independent of animal movement trajectories may occur simultaneously with sensorimotor inputs (Yiu et al., 2022).

      In this manuscript, the authors concentrate on the CA3 area of the hippocampus and develop a model that accounts for both mechanisms. Specifically, the model generates extrinsic sequences through the short-term facilitation of CA3 cell activities, and intrinsic sequences via recurrent projections from the dentate gyrus. The model demonstrates how the phase precession of place cells in theta sequences is modulated by running direction and the recurrent DG-CA3 network architecture. To evaluate the extent to which firing sequences are induced by sensorimotor inputs and recurrent network architecture, the authors use the Pearson correlation coefficient to measure the ”intrinsicity” and ”extrinsicity” of spike pairs in their simulations.

      I find this research topic to be both important and interesting, and I appreciate the clarity of the paper. The idea of combining intrinsic and extrinsic mechanisms for theta sequences is novel, and the model effectively incorporates two crucial phenomena: phase precession and directionality of theta sequences. I particularly commend the authors’ efforts to integrate previous theories into their model and conduct a systematic comparison. This is exactly what our community needs: not only the development of new models, but also understanding the critical relationships between different models.

      We also would like to express our gratitude to Reviewer 2 for their numerous constructive criticisms that led to a very much improved revised manuscript!

      Reviewer #2

      1) The choice of timescale parameters for input facilitation and synaptic depression is still not fully justified in my opinion. The authors themselves mention that previous experiments suggest wide ranges for both timescales. Given that the generation of intrinsic and extrinsic sequences in their model is primarily driven by these two mechanisms, their chosen timescales should significantly impact the simulation results. I urge the authors to discuss the potential effects of selecting different sets of timescales and the possible limitations of the current selection of 500ms for both.

      For instance, the authors state in the caption of Fig 1 that all simulated rat trajectories were set at a speed of 20cm/s, which is a rat’s walking speed. However, the running speed of rats can exceed 3m/s. In this case, none of the CA3 cells in the model would produce any extrinsic sequences since the animal would traverse the place fields much more rapidly, preventing the sensorimotor input from increasing as it does in the model.

      The reviewer raised the valid point that our simulations may be sensitive to the short-term plasticity time constants and running speeds. We therefore conducted new simulations illustrated in Figure 3—figure supplements 1 and 2.

      In agreement with the reviewer’s assertion, using the current model parameters, a higher running speed would not elicit extrinsic sequences due to the lack of depolarization from spatial input (Figure 3—figure supplement 2A). However, an increase of running speed also requires sensory inputs to be available on a larger spatial scale (width of the spatial input box in our case). Parra-Barrero et al., eLife 10:e70296 and Parra-Barrero & Cheng 2023, PLOS Comput. Bio. 19:e1011101, e.g., showed that place field sizes become larger under higher running speeds and consequently lengthen the theta sequences. With such modification, along with a longer DG projection length (|r|), we were able to recover the theta sequence at a higher speed (100 cm/s), using the same STD and STF time constants (Figure 3—figure supplement 2B). Furthermore, it has been shown that theta frequency increases with running speed (e.g., Rivas et al., 1996, Exp Brain Res 108:113-8). In our analysis, a higher theta frequency (12Hz instead of 10Hz) is also able to counteract the effect of running speed and leads to control-level like phase precession Figure 3—figure supplement 2C).

      Consistent with this finding, the original study of Romani & Tsodyks 2015, Hippocampus 25:94-105, found a fourfold increase of speed (from 0.05 to 0.2 fraction of the track per second) to not affect phase-position relations (with UD = 0.8 and 800ms STD time constant), likely due to the large place field sizes covering 1/3 of the track. Thus, phase precession may only be affected by high speeds in narrow place fields in which activity would only be present for few theta cycles thus naturally having limited capacity for phase coding.

      We further refrain from increasing the running speed beyond 1m/s (e.g. 3m/s as suggested by the reviewer), as the typical running speed of a rat in an 80cm square environment is between 20-40cm/s (Mankin et al. 2012, PNAS, 109:19462-19467). Even on linear tracks, reported running speeds hardly exceed 120 cm/s (e.g. Ahmed and Mehta, 2018, J Neurosci 32:73737383; Schmidt et al., 2009 J Neurosci 29:13232-13241). To our knowledge phase precession for speeds above 1.2 m/s has not been reported so far at all, certainly also owing to experimental challenges. We, however, would speculate that beyond 120 cm/s phase precession could be meaningful in large environmental enclosures with wide place cells. Thus a version of our input model with very large place field sizes should generally be able to also cover very high running speeds.

      To conclude, STD and STF time constants do not need to be in a precise range to accommodate the behavioural time scales if the sensory input changes on accordingly larger spatial scales.

      Following up on the reviewer’s additional concern, we also checked the effect of time constants on the theta sequences (while keeping the running speed unchanged). Decreasing the time constant of STF (τF) to 100ms would degrade the theta sequence due to a lack of depolarization, as sensory input reverts to its resting value ( =0) too fast, but at 250ms, the temporal correlation of theta sequences is largely maintained (Figure 3—figure supplement 1A). However, such effects can be compensated for by an increase in sensory input which promotes input facilitation (Figure 3—figure supplement 1B). Further increasing τF does not significantly affect theta sequences as the sensory input amplitude have asymptotically reached their target values (Figure 3—figure supplement 1A bottom). The temporal correlation of theta sequences is not sensitive to the change in the time constant of STD (τD) (Figure 3—figure supplement 1C), possibly because the synaptic resource of the place cells behind the animal is reliably depleted by strong depolarization despite a fast recovery time (τD=100ms).

      Since the relation between running speed and theta sequences has been thoroughly studied in Parra-Barrero et al. 2021 and Parra-Barrero & Cheng 2023, and the precise range of STD and STF time scales does not play a critical role in the temporal structure of theta sequence, we refrain from substantially revising the manuscript and only briefly add these points after Figure 3.

      2) This is a point I overlooked in the initial review. The synaptic depression fraction UD is set at 0.9 or 0.7, implying that the synaptic coupling weight between CA3 excitatory cells (and CA3 to DG) is almost entirely depleted within a few hundred milliseconds. To my limited neuroscience knowledge, I am not aware of any experimental results that corroborate this potentially bold setup, and I urge the authors to provide relevant experimental and theoretical references if they exist.

      Most crucially, I find this setup biased towards supporting the authors’ theory for intrinsic sequences because it essentially eliminates the possibility of any CA3 cell producing an effective output to other neurons after it fires. Hence, I question whether the simulation results would be much less clean if a more moderate depression factor UD were utilized.

      We thank the reviewer for giving us the opportunity to further clarify. 1) Probabilties of synaptic release (here called UD for consistency with the original work by Romani and Tsodyks), can attain a very wide range and indeed achieve values up to 0.9 (for review see e.g. Dobrunz LE, Stevens CF, 1997, Neuron, 18: 995-1008). 2) Contrary to the reviewer’s impression, a higher UD (0.7-0.9 in our case) would bias the simulation towards even more extrinsicity. Larger UD produces steeper phase precession in extrinsic sequences, because it (temporarily) generates an even stronger asymmetrical connectivity. 3) The extreme value of 0.9 was only used in Figure 1 to best illustrate the original Romani and Tsodyks 2015 idea. 4) Our simulations without recurrent synaptic connections (Figure 6) do not even require short-term synaptic depression. In view of these arguments we refrained from making further additions to the paper and refer the critical reader to this comment.

      • I have a few final suggestions for the authors in the hopes of further improving the manuscript for the neuroscience community:

      • line 62: sensorimotor input is present or ABSENT?

      Intrinsic activity signatures are found ”EVEN when sensorimotor feedback is present”, as one may assume that this input may be able to completely override the intrinsic patterns.

      • line 76: played out. colloquial, consider rewriting/explaining

      We use ”evoked” now.

      • line 104: second part of motivation for Izhikevich-type model is wordy, and grammatically incorrect.

      We have shortened the sentence.

      • on potential limitations of the model lines 116-120: is the use of a box an important assumption, as opposed to a more graded function, exponential or gaussian?

      Using-spike based input, it is not straight-forward how to implement a graded input. One way would be to employ a stochastic point process with graded firing probability. We, however, chose to use a nonlinear facilitation function (see below).

      • line 124 (equation) and 129-130: How crucial is the non-linearity in the synaptic variable for the results? This is a strong assumption, as the nonlinearity is the dominant effect (as opposed to a correction/perturbation). Are there any other contributions for this ramp of activity due to sensory input?

      We found results to fit best with a non-linear facilitation function (see above), and, as argued in the manuscript, facilitation indeed acts non-linearly owing to the calcium-dependence of synaptic release. We have added a comment to the Methods section explaining that we use facilitation to generate a graded spatial input.

      • line 187: ’...neglecting gamma activity in the model.’ I suggest removing this part of the sentence, unless you motivate why gamma would be relevant and the conditions for its generation.

      We have followed the reviewer’s suggestion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank you for the time you took to review our work and for your feedback!

      The major changes to the manuscript are:

      1) Promoted by multiple reviewers, we have replaced the statistical analysis in Figure 1L with a bootstrap analysis, added an ANOVA (in Table S1), and have also added the same analysis with mice as a statistical unit as Figure S4J to the manuscript.

      2) In response to reviewer 1, comment 3, we have replaced the response latency maps previously shown in Figures 3B, 3C, 3E and 3F with response amplitude maps.

      3) In response to reviewer 2, comment 1, we have added a variant of the response traces shown in Figures 3B, 3C, 3E and 3F with mice as the statistical unit as Figures S2C and S2D.

      4) In response to reviewer 2, public review, we have added data from additional experiments as Figures S6F-S6H, that control for the effect of a saline injection.

      A detailed point-by-point response to all reviewer concerns is provided in the following.  

      Reviewer #1 (Public Review):

      The authors present a study of visuo-motor coupling primarily using wide-field calcium imaging to measure activity across the dorsal visual cortex. They used different mouse lines or systemically injected viral vectors to allow imaging of calcium activity from specific cell-types with a particular focus on a mouse-line that expresses GCaMP in layer 5 IT (intratelencephalic) neurons. They examined the question of how the neural response to predictable visual input, as a consequence of self-motion, differed from responses to unpredictable input. They identify layer 5 IT cells as having a different response pattern to other cell-types/layers in that they show differences in their response to closed-loop (i.e. predictable) vs open-loop (i.e. unpredictable) stimulation whereas other cell-types showed similar activity patterns between these two conditions. They analyze the latencies of responses to visuomotor prediction errors obtained by briefly pausing the display while the mouse is running, causing a negative prediction error, or by presenting an unpredicted visual input causing a positive prediction error. They suggest that neural responses related to these prediction errors originate in V1, however, I would caution against overinterpretation of this finding as judging the latency of slow calcium responses in wide-field signals is very challenging and this result was not statistically compared between areas. Surprisingly, they find that presentation of a visual grating actually decreases the responses of L5 IT cells in V1. They interpret their results within a predictive coding framework that the last author has previously proposed. The response pattern of the L5 IT cells leads them to propose that these cells may act as 'internal representation' neurons that carry a representation of the brain's model of its environment. Though this is rather speculative. They subsequently examine the responses of these cells to anti-psychotic drugs (e.g. clozapine) with the reasoning that a leading theory of schizophrenia is a disturbance of the brain's internal model and/or a failure to correctly predict the sensory consequences of self-movement. They find that anti-psychotic drugs strongly enhance responses of L5 IT cells to locomotion while having little effect on other cell-types. Finally, they suggest that anti-psychotics reduce long-range correlations between (predominantly) L5 cells and reduce the propagation of prediction errors to higher visual areas and suggest this may be a mechanism by which these drugs reduce hallucinations/psychosis.

      This is a large study containing a screening of many mouse-lines/expression profiles using wide-field calcium imaging. Wide-field imaging has its caveats, including a broad point-spread function of the signal and susceptibility to hemodynamic artifacts, which can make interpretation of results difficult. The authors acknowledge these problems and directly address the hemodynamic occlusion problem. It was reassuring to see supplementary 2-photon imaging of soma to complement this data-set, even though this is rather briefly described in the paper. Overall the paper's strengths are its identification of a very different response profile in the L5 IT cells compared other layers/cell-types which suggests an important role for these cells in handling integration of self-motion generated sensory predictions with sensory input. The interpretation of the responses to anti-psychotic drugs is more speculative but the result appears robust and provides an interesting basis for further studies of this effect with more specific recording techniques and possibly behavioral measures.

      We thank the reviewer for the feedback and the help with improving the manuscript. We agree, the findings presented in this study are merely a starting point. The two questions we are currently pursuing in follow up work are:

      1) Do the findings generalize to all known antipsychotic drugs?

      2) What is the mechanism by which these drugs induce a decorrelation of activity, specifically in layer 5 neurons?

      But we suspect these questions will take at least a few more years of research to answer.

      Reviewer #2 (Public Review):

      Summary:

      This work investigates the effects of various antipsychotic drugs on cortical responses during visuomotor integration. Using wide-field calcium imaging in a virtual reality setup, the researchers compare neuronal responses to self-generated movement during locomotion-congruent (closed loop) or locomotionincongruent (open loop) visual stimulation. Moreover, they probe responses to unexpected visual events (halt of visual flow, sudden-onset drifting grating). The researchers find that, in contrast to a variety of excitatory and inhibitory cell types, genetically defined layer 5 excitatory neurons distinguish between the closed and the open loop condition and exhibit activity patterns in visual cortex in response to unexpected events, consistent with unsigned prediction error coding. Motivated by the idea that prediction error coding is aberrant in psychosis, the authors then inject the antipsychotic drug clozapine, and observe that this intervention specifically affects closed loop responses of layer 5 excitatory neurons, blunting the distinction between the open and closed loop conditions. Clozapine also leads to a decrease in long-range correlations between L5 activity in different brain regions, and similar effects are observed for two other antipsychotics, aripripazole and haloperidol, but not for the stimulant amphetamine. The authors suggest that altered prediction error coding in layer 5 excitatory neurons due to reduced longrange correlations in L5 neurons might be a major effect of antipsychotic drugs and speculate that this might serve as a new biomarker for drug development.

      Strengths:

      • Relevant and interesting research question:

      The distinction between expected and unexpected stimuli is blunted in psychosis but the neural mechanisms remain unclear. Therefore, it is critical to understand whether and how antipsychotic drugs used to treat psychosis affect cortical responses to expected and unexpected stimuli. This study provides important insights into this question by identifying a specific cortical cell type and long-range interactions as potential targets. The authors identify layer 5 excitatory neurons as a site where functional effects of antipsychotic drugs manifest. This is particularly interesting as these deep layer neurons have been proposed to play a crucial role in computing the integration of predictions, which is thought to be disrupted in psychosis. This work therefore has the potential to guide future investigations on psychosis and predictive coding towards these layer 5 neurons, and ultimately improve our understanding of the neural basis of psychotic symptoms.

      • Broad investigation of different cell types and cortical regions:

      One of the major strengths of this study is quasi-systematic approach towards cell types and cortical regions. By analysing a wide range of genetically defined excitatory and inhibitory cell types, the authors were able to identify layer 5 excitatory neurons as exhibiting the strongest responses to unexpected vs. expected stimuli and being the most affected by antipsychotic drugs. Hence, this quasi-systematic approach provides valuable insights into the functional effects of antipsychotic drugs on the brain, and can guide future investigations towards the mechanisms by which these medications affect cortical neurons.

      • Bridging theory with experiments

      Another strength of this study is its theoretical framework, which is grounded in the predictive coding theory. The authors use this theory as a guiding principle to motivate their experimental approach connecting visual responses in different layers with psychosis and antipsychotic drugs. This integration of theory and experimentation is a powerful approach to tie together the various findings the authors present and to contribute to the development of a coherent model of how the brain processes visual information both in health and in disease.

      Weaknesses:

      • Unclear relevance for psychosis research

      From the study, it remains unclear whether the findings might indeed be able to normalise altered predictive coding in psychosis. Psychosis is characterised by a blunted distinction between predicted and unpredicted stimuli. The results of this study indicate that antipsychotic drugs further blunt the distinction between predicted and unpredicted stimuli, which would suggest that antipsychotic drugs would deteriorate rather than ameliorate the predictive coding deficit found in psychosis. However, these findings were based on observations in wild-type mice at baseline. Given that antipsychotics are thought to have little effects in health but potent antipsychotic effects in psychosis, it seems possible that the presented results might be different in a condition modelling a psychotic state, for example after a dopamine-agonistic or a NMDA-antagonistic challenge. Therefore, future work in models of psychotic states is needed to further investigate the translational relevance of these findings.

      • Incomplete testing of predictive coding interpretation

      While the investigation of neuronal responses to different visual flow stimuli Is interesting, it remains open whether these responses indeed reflect internal representations in the framework of predictive coding. While the responses are consistent with internal representation as defined by the researchers, i.e., unsigned prediction error signals, an alternative interpretation might be that responses simply reflect sensory bottom-up signals that are more related to some low-level stimulus characteristics than to prediction errors. Moreover, This interpretational uncertainty is compounded by the fact that the used experimental paradigms were not suited to test whether behaviour is impacted as a function of the visual stimulation which makes it difficult to assess what the internal representation of the animal actual was. For these reasons, the observed effects might reflect simple bottom-up sensory processing alterations and not necessarily have any functional consequences. While this potential alternative explanation does not detract from the value of the study, future work would be needed to explain the effect of antipsychotic drugs on responses to visual flow. For example, experimental designs that systematically vary the predictive strength of coupled events or that include a behavioural readout might be more suited to draw from conclusions about whether antipsychotic drugs indeed alter internal representations.

      • Methodological constraints of experimental design

      While the study findings provide valuable insights into the potential effects of antipsychotic drugs, it is important to acknowledge that there may be some methodological constraints that could impact the interpretation of the results. More specifically, the experimental design does not include a negative control condition or different doses. These conditions would help to ensure that the observed effects are not due to unspecific effects related to injection-induced stress or time, and not confined to a narrow dose range that might or might not reflect therapeutic doses used in humans. Hence, future work is needed to confirm that the observed effects indeed represent specific drug effects that are relevant to antipsychotic action.

      Conclusion:

      Overall, the results support the idea that antipsychotic drugs affect neural responses to predicted and unpredicted stimuli in deep layers of cortex. Although some future work is required to establish whether this observation can indeed be explained by a drug-specific effect on predictive coding, the study provides important insights into the neural underpinnings of visual processing and antipsychotic drugs, which is expected to guide future investigations on the predictive coding hypothesis of psychosis. This will be of broad interest to neuroscientists working on predictive coding in health and in disease.

      We thank the reviewer for the feedback and the help with improving the manuscript.

      Regarding the concern of a lack of a negative control, we have repeated the correlation measurement experiments in a cohort of Tlx3-Cre x Ai148 mice that received injections of saline. This analysis is now shown in Figure S6F-S6H. Saline injections did not change correlations in L5 IT neurons. Combined with the absence of changes in the L5 IT correlation structure following amphetamine injections (Figures 7G – 7I), this suggests that unspecific effects related to stress of injection, or simply time, cannot explain the observed decorrelation effect of the antipsychotic drugs.

      And we fully agree, a lot more work is needed to confirm that the observed effects are specific and relevant to antipsychotic action.

      Reviewer #3 (Public Review):

      The study examines how different cell types in various regions of the mouse dorsal cortex respond to visuomotor integration and how antipsychotic drugs impacts these responses. Specifically, in contrast to most cell types, the authors found that activity in Layer 5 intratelencephalic neurons (Tlx3+) and Layer 6 neurons (Ntsr1+) differentiated between open loop and closed loop visuomotor conditions. Focussing on Layer 5 neurons, they found that the activity of these neurons also differentiated between negative and positive prediction errors during visuomotor integration. The authors further demonstrated that the antipsychotic drugs reduced the correlation of Layer 5 neuronal activity across regions of the cortex, and impaired the propagation of visuomotor mismatch responses (specifically, negative prediction errors) across Layer 5 neurons of the cortex, suggesting a decoupling of long-range cortical interactions.

      The data when taken as a whole demonstrate that visuomotor integration in deeper cortical layers is different than in superficial layers and is more susceptible to disruption by antipsychotics. Whilst it is already known that deep layers integrate information differently from superficial layers, this study provides more specific insight into these differences. Moreover, this study provides a first step into understanding the potential mechanism by which antipsychotics may exert their effect.

      Whilst the paper has several strengths, the robustness of its conclusions is limited by its questionable statistical analyses. A summary of the paper's strengths and weaknesses follow.

      Strengths:

      The authors perform an extensive investigation of how different cortical cell types (including Layer 2/3, 4 , 5, and 6 excitatory neurons, as well as PV, VIP, and SST inhibitory interneurons) in different cortical areas (including primary and secondary visual areas as well as motor and premotor areas), respond to visuomotor integration. This investigation provides strong support to the idea that deep layer neurons are indeed unique in their computational properties. This large data set will be of considerable interest to neuroscientists interested in cortical processing.

      The authors also provide several lines of evidence that visuomotor information is differentially integrated in deep vs. superficial layers. They show that this is true across experimental paradigms of visuomotor processing (open loop, closed loop, mismatch, drifting grating conditions) and experimental manipulations, with the demonstration that Layer 5 visuomotor integration is more sensitive to disruption by the antipsychotic drug clozapine, compared with cortex as a whole.

      The study further uses multiple drugs (clozapine, aripiprazole and haloperidol) to bolster its conclusion that antipsychotic drugs disrupt correlated cortical activity in Layer 5 neurons, and further demonstrates that this disruption is specific to antipsychotics, as the psychostimulant amphetamine shows no such effect.

      In widefield calcium imaging experiments, the authors effectively control for the impact of hemodynamic occlusions in their results, and try to minimize this impact using a crystal skull preparation, which performs better than traditional glass windows. Moreover, they examine key findings in widefield calcium imaging experiments with two-photon imaging.

      Weaknesses:

      A critical weakness of the paper is its statistical analysis. The study does not use mice as its independent unit for statistical comparisons but rather relies on other definitions, without appropriate justification, which results in an inflation of sample sizes. For example, in Figure 1, independent samples are defined as locomotion onsets, leading to sample sizes of approx. 400-2000 despite only using 6 mice for the experiment. This is only justified if the data from locomotion onsets within a mouse is actually statistically independent, which the authors do not test for, and which seems unlikely. With such inflated sample sizes, it becomes more likely to find spurious differences between groups as significant. It also remains unclear how many locomotion onsets come from each mouse; the results could be dominated by a small subset of mice with the most locomotion onsets. The more disciplined approach to statistical analysis of the dataset is to average the data associated with locomotion onsets within a mouse, and then use the mouse as an independent unit for statistical comparison. A second example, for instance, is in Figure 2L, where the independent statistical unit is defined as cortical regions instead of mice, with the left and right hemispheres counting as independent samples; again this is not justified. Is the activity of cortical regions within a mouse and across cortical hemispheres really statistically independent? The problem is apparent throughout the manuscript and for each data set collected. An additional statistical issue is that it is unclear if the authors are correcting for the use of multiple statistical tests (as in for example Figure 1L and Figure 2B,D). In general, the use of statistics by the authors is not justified in the text.

      Finally, it is important to note that whilst the study demonstrates that antipsychotics may selectively impact visuomotor integration in L5 neurons, it does not show that this effect is necessary or sufficient for the action of antipsychotics; though this is likely beyond the scope of the study it is something for readers to keep in mind.

      We thank the reviewer for the feedback and the help with improving the manuscript.

      Regarding the concerns of statistical analysis, this may partially be a misunderstanding. We apologize for the lack of clarity. For example, the data in Figures 1F-1K is indeed shown as averaged over locomotion onsets, but there is no statistical analysis performed in these panels. The unit for the statistical analysis shown in Figure 1L is brain area (not locomotion onset). A central tenet of the analysis shown in Figures 1L and 2 is that the effect of differential activation during closed and open loop locomotion onsets is not specific to visual areas of cortex. In visual areas of cortex, one would expect to find a difference. In essence, the surprising finding here is the lack of a difference in other cell types but L5 IT neurons. Thus, in the analyses of those figure panels we are testing whether the effect is present on average across all cortical areas. Hence, we chose the statistical unit of Figure 1L to be cortical areas, not mice. We have added the same analysis with mice as a statistical unit as Figure S4J.

      Reviewer #1 (Recommendations For The Authors):

      I have a few concerns and questions that I would like to see addressed:

      1) Figure 1L - the statistics are a little unusual here as the errors are across visual areas rather than across mice or hemispheres. This isn't ideal as ideally, we want to generalize the results across animals, not areas, and the results seem to be driven mostly by V1/RSC. I would like to see comparisons using mice as the statistical unit either in an ANOVA with areas as factors or post-hoc comparisons per area.

      Based on the assumption that visual cortex should respond to visual stimuli, we would have expected to find a difference between closed and open loop locomotion onset responses in all cell types in visual areas of cortex (a closed loop locomotion onset being the combination of locomotion and visual flow onset, while an open loop locomotion onset lacks the visual flow component). Thus, the first surprise was that in most cell types we found very little difference between these two locomotion onset types. Conversely, in Tlx3-positive L5 IT neurons the difference was apparent well outside of the visual areas of cortex (even though the difference was indeed strongest in V1/RSC). To quantify the extent to which closed and open loop locomotion onsets result in different activity patterns across dorsal cortex we performed the analyses shown in Figures 1L and 2. To make the point that the effect was observable on average across cortical areas, we used cortical area as a unit in Figure 1L. We have added the analysis shown in Figure 1L with mice as the statistical unit as Figure S4J and have added the ANOVA information to Table S1, as suggested.

      2) The reduction of activity of L5 IT cells in V1 after the presentation of gratings is curious. The authors suggest it might have been due to one population of cells tuned for the orientation of the presented grating suppressing the remaining cells leading to an aggregate negative response. However, they also observed this negative response in the 2p signal for individual somata. Presumably in the 2p data they could check their hypothesis - is there a group of cells that were tuned for the grating? Is it possible that for some reason the L5 IT cells in the 2p were not being activated by the grating because of their RF locations? How large were the gratings - I didn't see this in the methods section?

      We can certainly identify neurons that selectively increase activity to one particular grating. See Author response image 1, for vertical and horizontal gratings. The gratings were presented full-field on a toroidal screen that surrounded the mouse (240 degrees horizontal and 100 degrees vertical coverage of the visual field). This covered a large fraction of the field of view of the mouse. While we did not map receptive fields of individual neurons in this study, it is unlikely that the receptive fields of the neurons recorded were outside the stimulated area. We have made this clearer in the manuscript.

      Author response image 1.

      The population L5 IT neuron response to full-field drifting grating stimuli was a decrease of activity, yet there were increasing responses in a subset of neurons. (A) Heatmap of responses of all L5 IT neuron somata recorded with two-photon imaging in 7 Tlx3-Cre x Ai148 mice to drifting gratings of vertical orientation, sorted by their response. Data were sorted on odd trials and plotted on even trials to avoid regression to the mean artifacts. Dashed black box marks the top 10% responsive neurons. The data are a subset of the data shown in Figure S3D. (B) As in A, but for responses to drifting gratings of horizontal orientation. (C) Responses of top 10% vertical grating responsive neurons (dashed black box in A) to vertical (orange) or horizontal gratings (green). Neurons were selected on odd trials, and the average response of even trials is shown. (D) As in A, but sorted to the response of horizontal drifting gratings. (E) As in D, but for the horizontal grating stimulus. (F) As in C, but for the top 10% horizontal grating responsive neurons.

      3) I would caution against over-interpretation of latencies from wide-field GCaMP activity (Figure 3). A weaker response in a smaller population of neurons that has the same latency as a strong response in a large population of neurons will appear to have different latencies when convolved with the GCaMP kernel. Also there doesn't appear to be any statistical support for different latencies in different cortical areas. Either this should be correctly treated (ideally with linear mixed effects models to account for the increased correlation within animals) or the latency conclusions should be removed from the manuscript (my recommendation).

      We suspect that by “latency conclusions” the reviewer means “latency analysis”. The only time we mention latency differences is to state that: “In C57BL/6 mice that expressed GCaMP brain wide, both visuomotor mismatch and grating stimuli resulted in increases of activity that were strongest and appeared first in visual regions of dorsal cortex (Figures 3A-3C).”

      Nevertheless, we agree with the reviewer that response latency and response amplitude are not independent in our measurements and have replaced the latency plots in Figures 3B, 3C, 3E and 3F with average response maps.

      4) Given that the data is baseline corrected, is it possible that the effects of the anti-psychotic drugs on L5IT cells was due to a change in the baseline activity of this population?

      While we do find a small increase in average activity as a result of antipsychotic drug injections (Author response image 2), these effects are much smaller than those on locomotion onset responses.

      Author response image 2.

      On average, activity was increased in dorsal cortex after administration of antipsychotic drugs. Average calcium activity over the entire recording session before (naïve) and after (antipsy.) the administration of antipsychotic drugs. Colored lines indicate paired data for individual mice (Blue: 5 mice that had received clozapine, green: 3 mice that had received aripiprazole, red: 3 mice that had received haloperidol).

      To illustrate that the clozapine induced change in locomotion related activity cannot be explained by baseline activity differences, we have replotted the responses shown in Figures 4D and 4E, S3B, S5F without baseline subtraction (Author response image 3).

      Author response image 3.

      Antipsychotic drug injection only modestly shifts the baseline before locomotion onsets. (A) Average response expressed as F/F0 (wherein F0 was defined as the median of a recording session) during closed (solid line, 1101 onsets) and open loop (dashed line, 348 onsets) locomotion onsets in 5 Tlx3-Cre x Ai148 mice that expressed GCaMP6 in layer L5 IT neurons. Shading indicates SEM over onsets. Dashed horizontal line marks a value of F/F0 of 1.005 for comparison with panel B. Underlying data were the same as in Figures 4D and 4E. (B) As in A, but after a single intraperitoneal injection of the drug clozapine and for 707 closed and 350 open loop locomotion onsets. (C) Average response expressed as F/F0 (wherein F0 was defined as the median of a recording session) of L5 soma in V1, recorded with two-photon imaging in 7 Tlx3-Cre x Ai148 mice that expressed GCaMP6 in L5 IT neurons, during either closed (solid) or open loop (dashed) locomotion onsets. Shading indicates SEM over 8434 neurons. Dashed horizontal line marks a value of F/F0 of 1.045 for comparison with panel D. Underlying data were the same as in Figure S3B. (D) As in C, but for the 3 Tlx3 x Ai148 mice that had received a single intraperitoneal injection of clozapine. Underlying data were from Figure S5F.

      5) Figure 5/Figure S6 - Do the results really reflect an effect of distance or is it driven by areas from different hemispheres. Does the result hold if they factor out the effect of hemisphere or calculate the results within hemisphere?

      The effect appears qualitatively unchanged when we exclude interhemispheric connections from the analysis (Author response image 4).

      Author response image 4.

      As in Figures 6D-6F, but with the exclusion of interhemispheric connections. The decorrelation effect appears qualitatively unchanged.

      Reviewer #2 (Recommendations For The Authors):

      In addition to my public review, I only have one statistics-related and a few minor editing suggestions for the abstract. I hope that these might help the authors to improve their manuscript.

      1) It seems that the researchers are combining observations across different subjects, as seen in Figure 1F-L as well as in all of the other figures. While this has been a common practice in their field, it is now widely recognized that this approach can result in biased statistical inferences since it violates the assumptions of most statistical tests (see this recent discussion: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7906290/). As such, it may be beneficial for the authors to consider utilizing statistical tests that are designed to accurately deal with hierarchical data sets, like linear mixed models or hierarchical bootstrap, to confirm their key results. Additionally or alternatively, presenting data grouped by subject would help demonstrate the consistency of their findings across subjects.

      Please note, in Figures 1F-1K, there are no statistical tests – but the data are indeed averaged over locomotion onsets across all mice. We could use hierarchical sampling to calculate a bootstrap estimate of the mean response curves and show those instead, but that is also not standard practice in the field. We suspect this is also not what the reviewer is suggesting. In Figure 1L, the unit is indeed brain areas (see also our response to comment 1 of reviewer 1), but it is not areas x mice (i.e., the analysis is not hierarchical).

      We have now added a supplementary panel (Figure S4J) that shows the data of Figure 1L with mouse as the statistical unit (note, this is also not hierarchical). We have replaced the statistical test data using bootstrapping, as the reviewer suggests. This information can be found in Table S1.<br /> In Figures 2B and 2D, we have replaced the statistical test with hierarchical bootstrap, and updated the corresponding information in Table S1.

      For Figure 3, in which we show mismatch and grating onset responses averaged using onsets as the base unit, we have added supplementary panels (Figure S2) that show the same analysis using mice as the statistical unit. This did not change any of the conclusions. Note, there was no statistical testing in Figure 3.

      For the decorrelation effect of the different antipsychotic drugs that we show in Figures 6 and 7 the statistical unit is mice x region pairs (that is, while the structure is hierarchical, all mice contribute the same number of pairs). Our data are underpowered to use hierarchical bootstrap for testing the drug effects individually. However, if we combine all antipsychotic drug data (clozapine, aripiprazole, and haloperidol) we reach the same conclusions with hierarchical bootstrap as with the statistical tests (ttest and ranksum) used in the paper (Author response image 5).

      Author response image 5.

      Hierarchical bootstrap of the combined distribution of correlation values shown in Figures 6F, 7C and 7F did not change the conclusion that administration of antipsychotic drugs reduces L5 IT neuron correlations. Statistical comparisons using hierarchical bootstrap: Short-range vs no change, p < 0.001; long-range vs no change, p < 0.001; short-range vs longrange, p < 0.05.

      2) Given the impressive amount of data, I found it sometimes a little difficult to follow the manuscript. The authors might want to consider including a high-level overview of their results and rationales at the end of the introduction, and start each Results subsection with a sentence referring back to that highlevel overview ("To test whether X, we did Y and present it in this section.")

      We have attempted to improve the writing along these lines.

      3) Some suggestions that might further improve the clarity of writing.

      Abstract: Does the brain really distinguish between different "activity patterns", or would externallygenerated and self-generated "stimuli" be a slightly more accurate term to describe the observed alterations in schizophrenia?

      We would argue that (outside of sensory organs) the brain only has access to activity patterns, not stimuli directly. We would prefer to keep the phrasing with activity patterns here.

      Line 12: It might be easier to follow if the authors explicitly related that sentence back to the previous sentence "their ability to identify self-generated activity patterns" -> "their ability to distinguish between externally and self/internally generated ..."

      Absolutely correct – we have improved the writing here.

      Line 14: It remains unclear how visuomotor integration relates to the problem of distinguishing between self- and externally generated stimuli.

      We have attempted to expand on this in the abstract.

      Line 26: it remains unclear how the results support the activation of "internal representations" as this term has not been defined previously

      We have removed “internal representation” from the abstract.

      Results, line 80ff: I was confused by the description of all the different investigated cell types, as the first figure panels then only talk about brain wide and L5. Maybe the authors might find that shortening this with a reference to the methods might improve the flow.

      We have moved the list of cell types and mouse lines to the methods, as suggested.  

      Reviewer #3 (Recommendations For The Authors):

      The authors should strongly consider reassessing their statistics as outlined in the Public Review.

      Specifically:

      1) They should justify their definition of independent statistical unit; if this is not the mouse, they should justify why another definition (i.e. locomotion onset) is used, and show that their defined statistical unit achieves the requirements of being statistically independent (i.e. variance of the unit within a mouse is statistically indistinguishable from variance found between mice; more formally they could calculate the intraclass correlation (ICC)).

      We assume the reviewer is referring mainly to Figure 1 and therein to panel 1L.

      Since we did not perform statistical tests on the calcium traces, we are not sure why we would need to justify the choice of the unit we were showing. Moreover, Figure S2 shows the data of the V1 ROI averaged over mice to address this concern. As also mentioned to reviewer 2, we have amended this Figure S2 for the mouse-averaged traces of the V1 ROI data shown in main Figure 3.

      3) They should justify the statistical tests they use and whether they corrected for multiple comparisons; why for example was an ANOVA not used for Figure 1L and Figure 2B,D?

      We did not rely on ANOVA statistics for Figure 1L because we were mainly interested in carving out that Tlx3- (and Ntsr1-) positive mice inhabit a unique space when comparing the similarity of activity during closed and open loop locomotion onsets. We appreciate the reviewer taking a slightly different point of view on the data and now additionally report the ANOVA test result in Table S1. We have also opted to replace the statistical test in Figure 1L with bootstrapping. Lastly, we added Figure S4J which now shows the data in Figure 1L but with mice as the statistical unit.

      With similar logic, in Figure 2, we were not interested in comparing how the correlation of activity in cortical regions with locomotion behavior evolves over regions within a visuomotor feedback condition (closed loop, open loop or dark) but rather how a given region compares across feedback conditions.

      Still, we have opted to replace the statistical test in Figures 2B and 2D with hierarchical bootstrap, as also suggested by reviewer #2, comment 1. This did not change the significance indicator bars. We have accordingly updated Table S1 in which we report the full statistics.

    1. Author Response

      Thank you for allowing us to submit our manuscript to eLife and for the valuable feedback you have provided. We appreciate your recognition of the importance of our research question and the strengths of this study, including the use of a large sample size and heterogenous male and female rats, as well as the extensive behavioral data. We understand the concerns raised, and we believe that by addressing these concerns, we can further strengthen our manuscript and its contribution to the field of addiction research.

      Reviewer #1:

      Weaknesses: Language and statistical analysis can be improved.

      We acknowledge the concerns regarding language and statistical analysis. In the revised manuscript, we will thoroughly review and improve the language, ensuring clarity and coherence throughout the text. Additionally, we will reevaluate our statistical analysis, address any inconsistencies or shortcomings, and provide a clear explanation of our methods and results.

      Reviewer #2:

      Because the authors used so many rats (~600), it is not clear how strong the effects are. That is, a large n makes it easy to identify small effect sizes, but no effect sizes are presented regarding the findings.

      Concerning the effect sizes, we understand the importance of providing this information. In the revised manuscript, we will include effect sizes for our findings to better illustrate the magnitude of the observed effects and their practical significance.

      The Discussion includes parts that argue that the extended access model is a better model of addiction than short access and suggests that this paper provides support for that. However, there were no rats given short-access for the same period of time as the rats in this paper - i.e., no comparison group. Rather, the only comparison that can be made is as the rats transition from short to long access. The data in Figure 1B appear to show that the rats continue their increase in cocaine intake when they transition from short access to long access. The authors do not provide any statistical analyses about this escalation of intake during short access. However, they claim that "measures related to short-term cocaine intake" were orthogonal to those collected during longer access periods, yet it is not clear to me what measures those are. Nonetheless, as indicated in Figure 1H, it appears that the rats consistently shift from PC1 to PC2 across self-administration, regardless of whether they are in the short or long access period.

      That is, the long-access measures appear to simply be a continuation of the pattern begun during short access. As a result, notwithstanding the lack of a true short-access control group, it is difficult to see how the authors can draw conclusions about short vs. long access in this paper. Moreover, as illustrated in Figure 3A, the resilient vs. vulnerable subtypes are apparent during short access self-administration (i.e., they do not require long-access self-administration to develop or be revealed). This suggests, if anything, that short access would be sufficient for identifying such groups. Similarly, Figure 5 shows that short access would be sufficient to identify the "low" vulnerability quartile vs. the other three groups.

      We appreciate the concerns raised regarding the comparison between short and long access conditions. Note that the goal of the study was not to specifically compare short vs long access, but instead evaluate the relationship between addiction-like behaviors after long access. In the revised manuscript, we will focus on these findings and present a more accurate representation of the behavioral changes observed between short and long access conditions. By doing so, we believe that our conclusions will be better supported by the data, and our manuscript will provide a more comprehensive understanding of the factors contributing to addiction-like behaviors.

      During the discussion, the authors briefly discuss gender differences with regard to cocaine use disorder, with the authors trying to claim that women may be more vulnerable to cocaine use disorder. However, the two papers cited do not support that, as they are papers with rodents. A recent comprehensive review on humans with regard to cocaine craving and relapse noted no reliable gender differences (Nicolas et al., 2022, Pharmacological Reviews) and, as the authors themselves noted, men suffer from cocaine use disorder at higher rates than women.

      We apologize for any confusion regarding the discussion of sex differences in cocaine use disorder. We will revise this section in the manuscript to better reflect the current literature on human sex differences in cocaine craving and relapse, as well as the prevalence of cocaine use disorder.

      The authors noted that the rats received 0.5 mg/kg/infusion of cocaine but provided no explanation for how this dosing was maintained (or whether it was maintained) across the length of the study. Considering that rats, especially males, increase in size quite a bit during this stage, this could affect measures like intake as well as skew sex difference results. Likewise, the data are presented strictly in the number of cocaine infusions, which does not allow for consideration of body weight.

      In response to the concern about maintaining the 0.5 mg/kg/infusion cocaine dose throughout the study, we will explain our dosing procedures and any adjustments made to account for changes in body weight. Additionally, we will consider presenting data in terms of total cocaine intake (mg/kg) to account for potential differences in body weight between animals and sexes.

      In the Introduction, the authors make a number of arguments in the second paragraph that have no citations and, therefore, are unsupported.

      We will ensure that all statements in the Introduction are supported by appropriate citations, providing a solid foundation for our research question and the significance of our study.

      Reviewer #3:

      There are a number of factors - such as behavioral rate - that are not considered and likely co-vary with other measures. This is critical as previous work has shown that rate of behavior in reinforcement tasks is a large determinant of sensitivity to both drug effects on that behavior and punishers. This is not considered and but additional information and tempering the interpretation of the data would further strengthen the manuscript.

      We understand the concern regarding the potential influence of behavioral rates on our findings. In the revised manuscript, we will consider the impact of behavioral rates on our measures and discuss how they may have affected the results. By addressing this concern, we believe it will further strengthen the manuscript and provide a more comprehensive understanding of the factors contributing to addiction-like behaviors.

      We are confident that addressing these concerns will significantly improve our manuscript and provide a more robust and accurate representation of our findings. We appreciate the constructive feedback from the reviewers and look forward to submitting our revised manuscript to eLife.

    1. Author Response

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      Major Concerns:

      1) There are numerous grammatical issues throughout the manuscript, and too much awkward jargon is used, such as "status of energy stresses", "ES-acetate". The characterization of acetate as an "energy stress" gives a negative connotation, which is unnecessary and confusing. Ketones are produced under the same circumstances but are a vital adaptive response, except for ketoacidosis. The terminology used throughout the manuscript is also vague, and some methodology is not adequately described in the Methods section. For example, the meaning of "preprandial" and "postprandial" is unclear, and there is no explanation of the related methodology.

      Thank you for your comments. We have replaced "status of energy stresses" with "energy stresses", in our revised manuscript. We agree with you that acetate and Ketone Bodies are produced under the same circumstances and their production is a result of a vital adaptive response. It is well known that the production of large amount of acetate and Ketone Bodies is an important physiological adaption of body in response to energy stresses such as prolonged starvation and untreated diabetes mellitus. In this context, we use “energy stress-acetate”, a term coined by ourselves to emphasize the condition of acetate production and its role under such condition. Based on your concerns, we have addressed the issues and provided a thorough description of the modifications made in the Methods section.

      2) The authors claim that acetate is a ketone body, which is incorrect. As the authors show, it is not produced by the ketogenic pathway or from the breakdown of ketones. Acetate is a carboxylic acid and specifically a short-chain fatty acid.

      We agree with you that our description of acetate as a ketone body is seemingly incorrect. Indeed, acetate is a short-chain fatty acid in terms of molecular structure. The classic Ketone Bodies include acetone, acetoacetate and beta-hydroxybutyrate, among which acetone and acetoacetate contain carbonyl group and can be considered as ketone, however beta-hydroxybutyrate which contains only hydroxyl and carboxyl groups is actually not a ketone but a short-chain fatty acid. Noteworthily, here our description of acetate as an emerging novel “ketone body” is not aimed to consider it as a real ketone in structure, but to emphasize the high similarity of acetate and the classic Ketone Bodies in the organ (liver) and substrate (fatty acids-derived acetyl-CoA) of their production, the roles they played (as important sources of fuel and energy for many extrahepatic peripheral organs), the feature of their catabolism (converted back to acetyl-CoA and degraded in TCA cycle), as well as the physiological conditions of their production (energy stresses such as prolonged starvation and untreated diabetes mellitus). To prevent any potential misunderstanding, we annotate the usage of "ketone body" with double quotation marks in our revised manuscript.

      3) The human subjects are not sufficiently characterized, and it is unclear whether they are T1DM or T2DM subjects. No information is provided on morphometrics, how and when serum was collected, exclusion criteria, medicines, etc. Proper characterization of human subjects is necessary before publishing such data.

      Thank you very much for your comments. We have added the description of subjects you mentioned in the Methods section.

      4) While Figure 4 is an essential set of experiments that demonstrate that ACOT12 is necessary for the induction of acetate during starvation in mice, the authors do not explain the source of basal levels of acetate that persist in mice lacking ACOT12. It is unclear whether this source is from other tissue or microbiota. Since loss of ACOT by ShRNA treatment resulted in ~25% reduction in acetate, it is very difficult to conceive how this produces the profound neurological and strength deficits presented in Supplemental Figure 8 (see last point below).

      Additionally, it is not clear how the control mice for the knockout studies were generated. Please clarify.

      In normal condition, the serum acetate level in mice is around 200 μM. Hepatic ACOT12 and ACOT8 enzymes seems to provide a serum acetate concentration of 60-90 μM, individually (Figure 4). The intestinal microbiota contributes a serum acetate concentration of 60-80 μM (Figure 2 and Figure supplement 1).

      During energy stress, the protein levels of ACOT12 and ACOT8 in the mouse liver were significantly upregulated (Figure 3 and Figure supplement 1), resulting in an significant increase of serum acetate level to approximate 400 μM. The acetate produced by ACOT12 (~200 μM) and ACOT8 (~200 μM) constitutes the main portion of serum acetate concentration under such condition (Figure 2), while the contribution of intestinal microbiota to serum acetate level is minimized (Figure 2 and Figure supplement 1). Elimination of either ACOT12 or ACOT8 reduces serum acetate level by up to 50% (Figure 4). However, such estimation is only a rough approximation and does not consider the possibility of compensatory upregulation of ACOT12 and ACOT8 in kidney when ACOT12 or ACOT8 is knocked out in liver.

      Acetate assumes the role as an important energy source in the case of reduced glucose utilization associated with diabetes. In this case, knockdown of ACOT12 or ACOT8 (shACOT12 or shACOT8) can remarkably reduce acetate production and consequently influence the Motor Function of mice to a certain extent.

      5) The results presented in Figure 5 are confusing, and the authors' interpretation needs elaboration. The FAO assay detects water-soluble 3H-metabolites and 3H2O, and etimoxir or CPT1 knockout completely inhibits FAO. Therefore, it is unclear how peroxisomes can produce acetate without generating water-soluble intermediates that are detectable in the assay. Further explanation and rationale for the authors' interpretation are necessary.

      Mitochondria serve as the primary organelle for the catabolism of oleic acid. However, in certain instances, fatty acid oxidation (FAO) can occur in the peroxisome, resulting in the production of medium-chain fatty acids and acetyl-CoA. Nevertheless, these medium-chain fatty acids cannot undergo further oxidation within the peroxisome. Instead, they must be transported out of the peroxisome and then into the mitochondria through CPT1 (carnitine palmitoyltransferase 1) for further oxidation.

      To assess FAO, we utilized a detection method based on 3H labeling in H2O in cells treated with [9,10-3H(N)]-oleic acid. The introduction of [9,10-3H(N)]-oleic acid leads to the production of 3H-labeled medium-chain fatty acids and acetyl-CoA within the peroxisome. The further oxidation of 3H-labeled medium-chain fatty acids in the mitochondria was inhibited by impeding the activity of CPT1, leading to the eventual decrease of 3H-labeled H2O. However, acetyl-CoA can still be converted to acetate by ACOT8. As a result, knockdown or etomoxir inhibition of CPT1, decreased more than one-half of U-13C-palmitate-derived U-13C-acetate production, in spite of mitochondria β-oxidation being nearly completely abolished.

      6) Figure 6F, which shows various fatty acyl-CoAs in MPHs, is not helpful on its own. It would be useful to compare this data to loss of function MPH data and to measure these acyl-CoAs in knockout liver. Additionally, since it is normal for liver acetyl-CoA concentration to change by several-fold in fasted and fed liver, this data from snap frozen liver tissue of ACOT12/8 KO mice would help prove the authors' point.

      We are grateful for your valuable advice. As you mentioned there are indeed several outstanding questions that require further clarification. To address these questions, we are currently in the process of developing an experimental mouse model in which ACOT12 and ACOT8 are conditionally knocked out. By virtue of this approach, we aim to acquire more substantial evidence to substantiate the aforementioned conclusions.

      7) Figure 7 suggests that loss of ACOT inhibits ketogenesis by decreasing HMGCS2 expression and increasing its acetylation. However, it is difficult to imagine that this the main mechanism considering the extraordinary ability of liver to handle high rates of acetyl-CoA conversion to ketones during fasting which, as the authors know, is the canonical mechanism by which mitochondrial CoA is preserved during elevated FAO. The manuscript (Figure 6 and 7) argues that it is the conversion of acetyl-CoA to acetate which is more important. A critical limitation of this argument is that ACOT12 is in cytosol (Figure 5), so while it spares CoA for fatty acid activation, it does not spare CoA for beta oxidation in mitochondria. That latter function is carried out by the ketogenic pathway. A second limitation is that the mechanism relies on citrate transport and ACLY activity, which is not generally thought to be very active in the ketogenic states of fasting and T1DM studied here. In essence, the mechanism relies on circular logic, whereby mitochondrial acetyl-CoA accumulates in the setting of impaired FAO, which then impairs ketogenesis and depletes CoA which then impairs FAO without lowering acetyl-CoA. I don't have a solution, but I think it is important to acknowledge the flaws in this proposed mechanism.

      As the Reviewer suggested, ACLY indeed plays a crucial role in fatty acid synthesis. Acetyl-CoA is transported out of the mitochondria in the form of citrate, which is subsequently broken down into acetyl-CoA by ACLY. Under conditions of sufficient nutrition, acetyl-CoA carboxylase 1 further activates acetyl-CoA to participate in fatty acid synthesis.

      In the context of an energy crisis resulting from low glucose utilization, we propose that ACLY might serve another pivotal role in addressing this energy deficit. In conditions such as untreated diabetes or prolonged starvation, glucose utilization is significantly reduced, leading to a reliance of body on fatty acid oxidation in liver to generate Ketone Bodies and acetate to fuels extrahepatic peripheral tissues and thus cope with the energy crisis. However, excessive fatty acid oxidation disrupts the balance between oxidized and reduced CoA, necessitating the production of both acetate and Ketone bodies to restore this equilibrium. Conventionally, fatty acid synthesis is inhibited during this period as AMPK is activated to suppress acetyl-CoA carboxylase 1 activity via phosphorylation in low-energy states. Based on our preliminary experimental results, the activity of ACLY and citrate transporter still appear to work well. It is possible that citrate-ACLY-ACOT12-acetate pathway is important for downregulating the level of mitochondria acetyl-CoA in energy crisis. According to previous studies, cytosolic reduced CoA has the capability to be transported into the mitochondria, thereby replenishing the acetyl-CoA pool within the mitochondria (PMID: 32234503). It is important to note that this remains a hypothesis requiring further testing.

      8) Figure 8 presents some deceptively complex MS data following a 13C-acetate injection. The data is presented in an unorthodox manner, as 13C-metabolite intensities, making it nearly impossible to properly interpret. Enrichment of TCA cycle intermediates are not always easy to interpret, but at minimum, this data needs to be presented as MIDs or fractional enrichments. If the data is not modeled, then it might be useful to at least perform a rudimentary precursor-product analysis (i.e. normalized to plasma acetate enrichment).

      Supplemental Figure 8 also introduces evidence for neurological and strength deficits in shACOT12/8 knockdown mice. It is an interesting observation, but there is no direct link to the metabolic studies in the main figure, which does not present data in the loss of function mice. Nor is this part of the story investigated in liver specific knockout mice. Figure 8 is the least developed part of the manuscript and could be removed without losing the impact of the story.

      We deeply appreciate your valuable suggestions. As mentioned previously, we are currently engaged in the development of an experimental mouse model where ACOT12 and ACOT8 are selectively knocked out. Subsequent experiments will be conducted to validate this model, and the resulting data will be presented in the form of MIDs or fractional enrichments, as per your suggestion.

      The evaluation of anxiety-related behavior is commonly done using the Elevated Plus Maze Test (EPMT), while working memory and cognitive functions are assessed through the Y-maze Test (YMZT) and Novel Object Recognition (NOR) Test. Measures such as forelimb strength and running time in the rotarod test, total distance in YMZT, total entries in YMZT, and total distance in the NOR test are indicators of muscle force and movement ability. Our data demonstrate that acetate plays a significant role in enhancing muscle force and facilitating coordinated neuromuscular movement. Interestingly, we found that ACOT12/8 knockdown in the early stages of diabetes mellitus does not have a pronounced impact on psychiatric, memory, and cognitive behaviors (Figure 8 and figure supplement 2). However, it is important to note that our study primarily focuses on elucidating the utilization of acetate during energy crises, such as untreated diabetes and chronic hunger. Our findings suggest that acetate is primarily utilized to enhance motor capacity rather than cognitive or neural activity.

      Reviewer #2 (Recommendations for the authors):

      The statement that acetate is an emerging ketone body is not correct. It is not a ketone, it is a carboxylic acid or a short-chain fatty acid. In my opinion, to avoid confusion this should be clarified.

      We agree with you that our description of this is not clear enough. Acetate is a short-chain fatty acid in terms of molecular structure indeed.

      The classic Ketone Bodies include acetone, acetoacetate and beta-hydroxybutyrate, among which acetone and acetoacetate contain carbonyl group and can be considered as ketone, however beta-hydroxybutyrate which contains only hydroxyl and carboxyl groups is actually not a ketone but a short-chain fatty acid.

      Noteworthily, here our description of acetate as an emerging novel “ketone body” is not aimed to consider it as a real ketone in structure, but to emphasize the high similarity of acetate and the classic Ketone Bodies in the organ (liver) and substrate (fatty acids-derived acetyl-CoA) of their production, the roles they played (as important sources of fuel and energy for many extrahepatic peripheral organs), the feature of their catabolism (converted back to acetyl-CoA and degraded in TCA cycle), as well as the physiological conditions of their production (energy stresses such as prolonged starvation and untreated diabetes mellitus). To prevent any potential misunderstanding, we annotate the usage of "ketone body" with double quotation marks in our revised manuscript.

      The reason for increased fatty acid delivery to the liver is explained by insulin resistance rather than by reduced carbohydrate availability.

      Patient characteristics should be provided.

      Thank you for your suggestions. We have revised our manuscript accordingly.

      Reviewer #3 (Recommendations for the authors):

      • Please include the rationale for having data from both C57BL/6 and BALC/c. In metabolic research, C57BL/6 is more commonly studied. The data between these two strains are similar, and one could be easily removed to limit redundancy.

      Thank you for bringing this issue to our attention in the manuscript. In metabolic research, C57BL/6 mice are more commonly utilized as a model organism than BALC/c mice indeed. In this study we try to elucidate a characteristic may be shared among different mammalian species, namely the ability to produce a substantial amount of acetate during energy crises. However, given the constraints of our experimental setup, we opted to employ C57BL/6 mice as the main animal model to investigate the underlying mechanism. BALC/c mice were used to confirm the underlying mechanisms governing acetic acid production.

      • In the experiments where ACOT8 and ACOT12 are selectively knocked out or knocked down, please include the levels of other ketone bodies, such as 3-HB and AcAC, from these experiments. While acetate production is diminished, there might or might not be a compensatory increase in the production of these metabolites. This would include experiments related to Figures 3, 4, and 5.

      Thank you for your valuable comments. As you mentioned, in diabetic mice where ACOT12 and ACOT8 are knocked down in liver, there is a significant down-regulation of 3-HB and AcAc (Figure 7B, C). Based on this observation, we hypothesize that ACOT12 and ACOT8 might also play a regulatory role in the formation and metabolism of ketone bodies during an energy crisis. However, the precise regulatory mechanism underlying this phenomenon requires further investigation.

      • From Figure 1 (source data 1), two patients with diabetes have concurrent cancer. Cancer cells have altered metabolism compared to native cells. Thus, it is possible that circulating acetate cells may be altered in these cancer patients, regardless of the presence of diabetes. This should be acknowledged. Otherwise, these two subjects should be taken out.

      Thank you for your suggestions. We have taken out these two subjects in our revised manuscript.

      • Can the authors expand on their thoughts on why some results from the behavioral tests are statistically significant while others are not? For example, many motor tasks such as forelimb strength, running time, total distance, and total entries significantly differ with ACOT8 and ACOT12 knockdown. However, more anxiety-based measures such as time in open arms, correct alteration, and object recognition are not statistically different.

      Thank you for your comments. The evaluation of anxiety-related behavior is commonly done using the Elevated Plus Maze Test (EPMT), while working memory and cognitive functions are assessed through the Y-maze Test (YMZT) and Novel Object Recognition (NOR) Test. Measures such as forelimb strength and running time in the rotarod test, total distance in YMZT, total entries in YMZT and total distance in the NOR test are indicators of muscle force and movement ability. Our data demonstrate that acetate plays a significant role in enhancing muscle force and facilitating coordinated neuromuscular movement. Interestingly, we found that ACOT12/8 knockdown in the early stages of diabetes mellitus does not have a pronounced impact on psychiatric, memory, and cognitive behaviors (Figure 8 and figure supplement 2). However, it is important to note that our study primarily focuses on elucidating the utilization of acetate during energy crises, such as untreated diabetes and chronic hunger. Our findings suggest that acetate is primarily utilized to enhance motor capacity rather than cognitive or neural activity.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We want to thank you for organizing the review process a of our manuscript ‘Human skeletal muscle organoids model fetal myogenesis and sustain uncommitted PAX7+ myogenic progenitor’ for eLife and the reviewers for providing their criticisms.

      We have changed some Figures within the manuscript and added two new Supplementary Figures as outlined below

      Reviewer #1 (Public Review):

      The authors aimed to establish a cell culture system to investigate muscle tissue development and homeostasis. They successfully developed a complex 3D cell model and conducted a comprehensive molecular and functional characterization. This approach represents a critical initial step towards using human cells, rather than animals, to study muscular disorders in vitro. Although the current protocol is time-consuming and the fetal cell model may not be mature enough to study adult-onset diseases, it nonetheless provides a valuable foundation for future disease modelling studies using isogenic iPSC lines or patient-derived cells with specific mutations. The manuscript does not explore whether or how this stem cell model can advance our understanding of muscular diseases, which would be an exciting avenue for future research. Overall, the detailed protocol presented in this paper will be useful for informing future studies and provides an important resource to the stem cells community. The inclusion of data on disease modelling using isogenic iPSC lines or patient-derived cells would further enhance the manuscript's impact.

      We agree, that data on disease modelling using patient-derived cells would further enhance the manuscript's impact. The manuscript in its current form should present our skeletal muscle organoid differentiation protocol to the community with a focus of the developmental processes which are mimiced by this model. We are not aiming to disease model e.g. LGMD or Duchenne within the context of this study. Our protocol is just the starting point of us and others to use this organoid protocol for skeletal muscle disease modelling in further studies. We already have a study of Duchenne musculular dystrophy modelling using our organoid system under way.

      Reviewer #2 (Public Review):

      This paper illustrates that PSCs can model myogenesis in vitro by mimicking the in vivo development of the somite and dermomyotome. The advantages of this 3D system include (1) better structural distinctions, (2) the persistence of progenitors, and (3) the spatial distribution (e.g. migration, confinement) of progenitors. The finding is important with the implication in disease modeling. Indeed the authors tried DMD model although it suffered the lack of deeper characterization.

      The differentiation protocol is based on a current understanding of myogenesis and compelling. They characterized the organoids in depth (e.g. many time points and immunofluorescence). The evidence is solid, and can be improved more by rigorous analyses and descriptions as described below.

      Major comments:

      1) Consistency between different cell lines.

      I see the authors used a few different PSC lines. Since organoid efficiency differ between lines, it is important to note the consistency between lines.

      2) Heterogeneity among each organoid

      Let's say authors get 10 organoids in one well. Are they similar to each other? Does each organoid possess similar composition of cells? To determine the heterogeneity, the authors could try either FACS or multiple sectioning of each organoid.

      Concerning the raised issue of consistency between different PSC lines we stated under Material and Methods that skeletal muscle organoids were generated from six hiPSC lines: CB-CD34 iPSC, DMD iPSC, DMD_iPS1, BMD_iPS1, LGMD2A iPSC, LGMD2A-isogenic iPSC. We have evaluated the organoid approach with six hiPSC lines with independent genetic backgrounds with more than 5 independent derivations per line, for the control line (CB CD34+) with more than 20 derivations. At the time of creating the first preprint in 2020 our reported protocol was based on about 45 independent differentiation inductions.

      The heterogeneity among each organoid is a valid point, however very cumbersome to address with FACS or multiple sectioning.

      We have now addressed the heterogeneity of organoids within a line and the consistency of organoids between different lines by diffusion map analysis for early organoid stages and further single cell RNA seq analyses for mature stages and include this data as Figure 4 – figure supplement 6.

      3) Consistency of Ach current between organoids.

      Related to comment 2, are the currents consistent between each organoid? How many organoids were recorded in the figures? Also, please comment if the current differ between young and aged organoids.

      The acetylcholine (ACh)-induced changes in holding currents in Figure 3K are representative recordings with n=6. The further recordings in Figure 3 – Figure Supplemental 3 for organoids derived from three additional lines, were also recorded with n=6. Cells were taken for electrophysiological characterization in all analyses from 8 weeks organoids.

      4) Communication between neural cells and muscle?

      The authors did scRNAseq, but have not gone deep analysis. I would recommend doing Receptorligand mapping and address if neural cells and muscle are interacting.

      We are now providing a characterization of the cell-cell communication network for all clusters at week 12 of human skeletal muscle organoid development as the new Figure 4 – figure supplement 5.

      5) More characterization of DMD organoids.

      One of the key applications of muscle organoids is disease model. They have generated DMD muscle organoids, but rarely characterized except for currents. I recommend conducting immunofluorescence of DMA organoids to confirm structure change. Very intriguing to see scRNAseq of DMD organoids and align with disease etiology.

      We agree, that data on disease modelling using DMD patient-derived cells would further enhance the manuscript's impact. The manuscript in its current form should present our skeletal muscle organoid differentiation protocol to the community with a focus of the developmental processes which are mimiced by this model. We already have a study of Duchenne muscular dystrophy modelling using our organoid system under way.

      6) More characterization of engraft.

      Authors could measure the size of myotube between mice and human.

      We have quantitatively evaluated the myotubes in the transplantation experiment illustrated in Figure 4I,J. The mean diameter is 41+/-6 µm for the human and 63+/-7 µm for the mice fibers (n=15 each). See Author response image 1.

      Author response image 1.

      Does PAX7+ satellite cell exist in engraft? To exclude cell fusion events make up the observation, I recommend to engraft in GFP+ immunodeficient mice. Could the authors comment how long engraft survive.

      We would claim satellite cells within our engrafts with the DAPI-blue nuclei surrounded by green human lamin A/C as in Author response image 2. We have analysed all our mice six weeks post transplantation for engrafting similar to other groups in the field.

      Author response image 2.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript ends abruptly with the mouse transplantation experiment that appears a bit preliminary. It basically shows that cells survive but functional (or ultrastructural) integration is not shown. Suggest clarifying motivation and interpretation of the in vivo data.

      Back in 2020 our manuscript had already passed detailed review processes whereby we struggled by not providing any in vivo data concerning repopulation of our progenitor cells. Coming from the human pluripotent stem cell biology field we have never completely understood the value of this hybrid experiments to test human cells in mouse again.

      For the current version, we have then taken additional efforts to transplant our progenitor cells into injured skeletal muscle cells similarly to other groups in the field (Alexander et al., 2016, Marg et al., 2019, Tanoury et al., 2020) (Figure 4I,J). A proof that 3D-derived progenitor cells have a clear repopulation advantage over progenitor cells derived in a 2D protocol would go beyond what can be done within the scope of our study. We are still mainly basing our claims on the extended bulk and single RNA seq comparison to progenitor cells obtained by others. However, to address the demand of several experts to test our cells also in vivo, we can also provide in vivo data in the current manuscript version.

      Within the Discussion we are suggesting further evaluations using these transplantations: It would be of interest for future studies to investigate whether increased engraftment can be achieved in 3D protocols (Faustino Martins et al., 2020; Shahriyari et al., 2022; ours) versus 2D patterned progenitor cells.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      7) Plot CD82 gene on UMAP of Figure 4

      We had provided a CD82 scRNAseq analysis within the t-SNE plots of Figure 3 – figure supplement 1, which is demonstrating, that CD82-positive cells almost exclusively overlap with Pax7-positive cells, being a subcluster of them. We agree, that the reader will benefit from this further analysis and we are now providing in Author response image 3 additional CD82 and Pax7 UMAP plots on the myogenic progenitor / satellite cell clustering analysis of Figure 4F within the new Figure 4 – figure supplement 4E.

      Author response image 3.

      8) Immunofluorescence of CD82 in organoids

      We have tried CD82 immunofluorescence analysis on our organoids but are not very satisfied with the technical outcome. The available CD82 antibody seems to be primarily suited for FACS analysis and not for immunohistochemistry on slices.

      9) Change red-green color of the heatmap. Color-blind person cannot see it well

      We have changed all heatmaps to yellow-purple in the main Figure 2G and the Supplemental Figures S2.1 and S3.1..

    1. Author Response

      We are pleased that the data presented in our submission was found to be informative and suitable for publication in eLife. The Reviewers made several comments that we address below. They listed three weaknesses of our work: 1) details of RPE GLUT1 immunohistochemistry (IHC), 2) the mechanism of Arrdc4, and 3) the mechanism of HSP90AB1. Additional suggestions made by the Reviewers, aimed at elucidating mechanisms, are of great interest to us, but would require experiments that are beyond the scope of the current work.

      We provide the following provisional responses to the identified weaknesses:

      1) Reviewer 1 asked several questions regarding the IHC of GLUT1, including the number of retinas examined, the location and quantification of the staining, and our results relative to those of another publication.

      We injected more than one eye with each of the AAV-Best1-Txnip alleles.

      However, only one of the fully infected eyes of each allele was processed for GLUT1 IHC. We found the GLUT1 removal from the basolateral surface of the RPE by AAV-Best1-Txnip (i.e. the wild type full length allele) was complete, obvious, and consistent from eye to eye, as shown in our original publication (Xue et al., 2021, PMID: 33847261). It was obvious as the GLUT1 on the basolateral surface of the RPE is more easily scored than that on the apical surface. The photoreceptor inner segments and Müller glia microvilli also have GLUT1, and their processes are juxtaposed and/or intertwined with the apical processes of the RPE, making the apical process GLUT1 staining of the RPE much more difficult to score. In some sections where the RPE and the retina separate, we can score the apical process GLUT1 staining of the RPE, but we do not always have this situation in our sections. We should have been more explicit about the location of the IHC signal that we were referring to in the manuscript and will do so in the Revision.

      We present images in Figure 2 supplement 1 that are representative for each allele, in the one retina scored for each allele. As Dr. Xue was in the process of moving to China and setting up his own lab at the time of submission, additional retinas were not processed for IHC. However, his laboratory will examine the staining on additional retinas. Given that the results of the wild type allele were very reproducible, we do not anticipate different results from those we have presented for the new alleles. However, the quantification is difficult for the total GLUT1 protein within the RPE due to the ambiguities of staining in the photoreceptors and the Müller glia.

      As a separate issue, Reviewer #1 mentioned the work of another group (Wang et al., 2019, PMID: 31365873), which claimed that, on the apical surface of the RPE, GLUT1 is down-regulated in a RP mouse strain, RhoP23H. We have not consistently observed such a down-regulation of GLUT1 in other RP mouse strains such as rd1, rd10 or Rho-/- (unpublished data; see review Xue and Cepko, 2023, PMID: 37460158). However, we would like to point out that it is difficult to score GLUT1 staining on the RPE apical surface, as noted above. It is even more difficult in the degenerating retina where RPE and photoreceptor processes degenerate. For reference, one can see images of degenerating RPE apical processes in Wu et al. 2021 (PMID: 33491671).

      2) Little was known about the function of Arrdc4 until very recently. During our submission of this manuscript, a study was published concerning an Arrdc4 global knockout mouse by Richard Lee’s group. They proposed that Arrdc4 is critical for liver glucagon signaling, which could be negated by insulin (Dagdeviren et al, 2023, PMID: 37451484). The implication of this study to RP cone survival is unclear, but interestingly, the activation of insulin/mTORC1 pathway is helpful for RP cone survival, as first discovered by Claudio Punzo when a postdoc in our group (PMID: 19060896, PMID: 25798619).

      3) Little is known about the function of HSP90AB1. Recently, Ramamurthy’s group reported that knocking out HSP90AA1, a paralog of HSP90AB1 which has 14% different amino acids, led to rod death and correlated with PDE6 dysregulation (Munezero et al, 2023, PMID: 37172722). However, the exact role of HSP90AA1 in rods needs to be clarified, and the implications for HSP90AB1 in WT and/or RP cones are still unclear.

      The above responses will be incorporated to our next version of submission.

    1. Author Response

      Reviewer #1 (Public Review):

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Unfortunately, studies conducted in South America to understand host use by Culex mosquitoes are very limited, and there are virtually no studies on the seasonal pattern of host use. In Argentina, there is some evidence (Stein et al., 2013; Beranek, 2018) regarding the seasonal change in host use by Culex species, including Culex quinquefasciatus, where the inclusion of mammals during the autumn has been observed. As part of a comprehensive study on characterizing bridge vectors for SLE and WN viruses, our research group is currently working on the molecular identification of blood meals from engorged females to gain deeper insights into the seasonal host use by Culex mosquitoes.

      While the seasonal change in host use by Culex quinquefasciatus has not been reported in Argentina so far, there has been an observed increase in reported cases of SLE virus in humans between summer and autumn (Spinsanti et al., 2008). It is based on this evidence that we hypothesize there is a seasonal change in host use by Culex quinquefasciatus, similar to what occurs in the United States. This is also considering that both countries (Argentina and the United States) have regions with similar climatic conditions (temperate climates with thermal and hydrological seasonality).

      I think the authors need to discuss more about the bigger question they were addressing. I think that the discussion section can be strengthened greatly by elaborating on whether there is evidence for a seasonal shift in host use pattern in Cx. quinquefasciatus in the southern latitudes. If yes, what alternate mechanisms they believe could be driving the seasonal change in host use in this species in the southern latitudes now that they show the 'deriving reproductive advantages' hypothesis to be not true for those populations.

      We will restructure our discussion to align it with our results, as suggested.

      Grammar and writing

      The manuscript will be grammatically revised.

      Reviewer #2 (Public Review):

      There is no replication built into this study. Egg lay is a highly variable trait, even within treatments, so it is important to see replication of the effects of treatment across multiple discrete replicates. It is standard practice to replicate mosquito fitness experiments for this reason. Furthermore, the sample size was particularly small for some groups (e.g. 15 egg rafts for the second gonotrophic cycle of mice in the autumn, which was the only group for which a decrease in fecundity and fertility was detected between 1st and 2nd gonotrophic cycles). Replicates also allow investigators to change around other variables that might impact the results for unknown reasons; for example, the incubators used for fall/summer conditions can be swapped, ensuring that the observed effects are not artifacts of other differences between treatments. While most groups had robust sample sizes, I do not trust the replicability of the results without experimental replication within the study.

      We agree egg lay is a variable trait and so we consider high numbers of mosquitoes and egg lay during experiments compared to our studies of the same topics. Evaluating variables such as fecundity, fertility, or other types of variables (collectively referred to as "life tables") is a challenging issue that depends on several intrinsic and extrinsic factors. Because of all of this, in some experiments, sample sizes might not be very large, and in several articles, lower sample sizes could be found. For instance, in Richards et al. (2012), for Culex quinquefasciatus, during the second gonotrophic cycle, some experiments had 13 or even 6 egg rafts. For species like Aedes aegypti, the sample size for life table analysis is also usually small. As an example, Muttis et al. (2018) reported between 1 and 4 engorged females (without replicates). Because of this, we do find our sample sizes quite robust for our results.

      Regarding the need to repeat the experiments in order to give more robustness to the study we also agree. However, after a review of the literature (articles cited in the original manuscript), it is apparent that similar experiments are not frequently repeated as such. Examples of this are the studies of Richards et al. (2012), Demirci et al. (2014) or Telang & Skinner (2019), which even manipulate several cages at a time as “replicates”, they are not true replicates because they summarise and manipulate all data together, and do not repeat the experiment several times. We see these “replicates” as a way of getting a greater N.

      As it was stated by the reviewer, repetition is a resource and time consuming activity that we are not able to do. Replicating the experiment poses a significant time challenge. The original experiment took over three months to complete, and it is anticipated that a similar timeframe would be necessary for each replication (6 months in total considering two more replicates). Given our existing commitments and obligations, dedicating such an extensive period solely to this would impede progress on other crucial projects and responsibilities. Given the limitations of resources and time and the infrequent use of experimental repetition in this type of studies, we suggest performing a simulation-based analysis. This approach involves generating synthetic data that mimics the expected characteristics of the original experiment and subsequently subjecting it to the same analysis routine. The main goal of this simulation will be to evaluate the potential spuriousness and randomness of the results that might arise due to the experimental conditions. We will introduce this simulation-based analysis in the next revised version of the manuscript.

      Considering the hypothesis is driven by the host switching observed in the field, this phenomenon is discussed very little. I do not believe Cx. quinquefasciatus host switching has been observed in Argentina, only in the northern hemisphere, so it is possible that the species could have an entirely different ecology in Argentina. It would have been helpful to conduct a blood meal analysis prior to this experiment to determine whether using an Argentinian population was appropriate to assess this question. If the Argentinian populations don't experience host switching, then an Argentinian colony would not be the appropriate colony to use to assess this question. Given that this experiment has already been conducted with this population, this possibility should at least be acknowledged in the discussion. Or if a study showing host switching in Argentina has been conducted, it would be helpful to highlight this in the introduction and discussion.

      We are aware that few studies regarding host shifting in South America are available, some such those conducted by Stein et al. (2013) and Beranek (2018) reported a moderate host switch for Culex quinquefasciatus in Argentina. We have already performed a study about seasonal host feeding patterns for this species. As you suggested, we could mention it in the discussion to highlight our partial findings. However, even though there are few studies regarding host shifting, our hypothesis is based mainly in the seasonality of human cases of WNV and SLEV, a pattern that has been demonstrated for our region, see for example the study of Spinsanti et al. (2008).

      The impacts of certain experimental design decisions are not acknowledged in the manuscript and warrant discussion. For example, the larvae were reared under the same conditions to ensure adults of similar sizes and development timing, but this also prevents mechanisms of action that could occur as a result of seasonality experienced by mothers, eggs, and larvae.

      We understand the confusion that may have arisen due to a lack of further details in the methodology. If we are not mistaken, you are referring to our oversight regarding the consideration of carry-over effects of larvae rearing that could potentially impact reproductive traits. When investigating the effects of temperature or other environmental factors on reproductive traits, it is possible to acclimate either larvae or adults. This is due to the significant phenotypic plasticity that mosquitoes exhibit throughout their entire ontogenetic cycle. In our study, we followed an approach similar to that of other authors where the adults are exposed to experimental conditions (temperature and photoperiod). For a similar approach you can refer to the studies conducted by Ferguson et al. (2018) for Cx. pipiens, Garcia Garcia & Londoño Benavides (2007) for Cx. quinquefasciatus and Christiansen-Jucht et al. (2014, 2015) for Anopheles gambiae.

      Beyond the issue of lack of replication limiting trust in the conclusions in general, there is one conclusion reached at the end of the discussion that would not be supported, even if additional replicates are conducted. The results do not show that physiological changes in mosquitoes trigger the selection of new hosts. Host selection is never measured, so this claim cannot be made. The results don't even suggest that fitness might trigger selection because the results show that physiological changes are in the opposite direction as what would be hypothesized to produce observed host switches. Similarly, the last sentence of the abstract is not supported by the results.

      We agree with this observation. However, we did not evaluate the impact of fitness on host selection in this study. Instead, we aimed to investigate the potential influence of seasonality on mosquito fitness as a potential trigger for a shift in host selection. We agree that we have incorrectly used the term “host selection” when we should actually be discussing “host use change”. Our results indicate a seasonal alteration in mosquito fitness in response to temperature and photoperiod changes. Building upon this observation, we will discuss into our hypotheses and theoretical model to explain this seasonal shift in host use.

      Grammar and writing

      The manuscript will be grammatically revised by a professional translator.

    1. Author Response

      Thank you for your thorough critique and thoughtful suggestions for improving our manuscript, "Homeostatic Synaptic Plasticity of Miniature Excitatory Postsynaptic Currents in Mouse Cortical Cultures Requires Neuronal Rab3A.” The reviewers’ detailed comments suggest that showing multiple types of graphs to demonstrate the presence of divergent scaling of mEPSC amplitudes in cultures from Rab3A wild type, and its disruption in cultures from Rab3A knockout mice, had the unintended consequence of obscuring the major results of our study. Furthermore, our proposal that the difference in characteristics of scaling of GluA2 receptor expression compared to that of mEPSC amplitudes, based on the ratio plots, indicated that a mechanism other than postsynaptic receptors likely contributes to the homeostatic increase in mEPSC amplitude was not convincing to the reviewers. Reviewers 2 and 3 point out these results might be explained by differences in the limitations and artifacts of the two very distinct techniques, electrophysiology and fluorescence imaging. In the revision we will acknowledge that a greater variability in the signal, or, more issues with signal over noise, might be present in imaging experiments compared to electrophysiology. This could explain the lack of identical effects on GluA2 receptors compared to mEPSC amplitudes in the matched experiments, but we maintain it is also possible that a greater variability in GluA2 responses is biologically meaningful. Further, an issue with the accuracy of imaging experiments to report the true receptor effects would also call into question the conclusion that receptors always increase after activity blockade. Finally, the graphs illustrating the detailed characteristics of scaling with rank order and ratio plots required pooling multiple samples per cell, which precludes application of standard statistical methods to determine whether effects or differences reach statistical significance. Therefore, we will remove the cumulative distribution functions, rank order plots, and ratio plots, and show only analyses that involve a single sample per cell. This major change will simplify and clarify the main findings, that homeostatic plasticity of both mEPSC amplitude and GluA2 receptor expression in mouse cortical cultures involves the synaptic vesicle protein Rab3A operating in neurons rather than astrocytes. We will focus our comparison between mEPSC amplitudes and receptors in the same cultures to differences between the magnitude of effects on the mean or median, and will make clear that overall, our data can be explained by two possibilities: 1) the presynaptic vesicle protein is acting via regulation of postsynaptic receptors alone, or, it is regulating both postsynaptic receptors and another contributor to mEPSC amplitude, possibly amount of transmitter released by a single vesicle. Either way, it is very surprising that this presynaptic protein is involved in postsynaptic changes, so our results represent a novel contribution to the field of homeostatic plasticity. In sum, the changes we propose should go a long way towards addressing the majority of the reviewers’ major critiques.

      A related issue raised by the reviewers was that the model describing potential presynaptic mechanisms of Rab3A in homeostatic plasticity was not supported by direct evidence (Figure 10). We meant the model to introduce the possibility of a presynaptic contribution to mEPSC amplitude and to stimulate future research, but clearly did not communicate its speculative nature, neither in the Figure legend nor in our discussion of potential mechanisms. In the revision, we will restrict the model to the direct findings in this study. Additionally, we will state where appropriate, that while previous findings at the mouse NMJ are consistent with a presynaptic role for Rab3A (Wang et al., 2011), in the current study there is no direct evidence for this idea in cortical cultures other than the quantitative differences in the fold increases in mEPSC amplitudes and GluA2 receptors which were assayed in the same cultures.

      We will submit a revised version addressing each of the reviewer’s concerns and suggestions as described above and below; these major modifications will greatly improve the readability of the manuscript and clarify the main results.

      Reviewer #1

      Koesters and colleagues investigated the role of the presynaptic small GTPase Rab3A in homeostatic scaling of miniature synaptic transmission in primary mouse cortical cultures using electrophysiology and immunohistochemistry. The major finding is that TTX incubation for 48 hours does not induce an increase in the amplitude of excitatory synaptic miniature events in neuronal cultures derived from Rab3A KO and Rab3A Earlybird mutant mice. NASPM application had comparable effects on mEPSC amplitude in control and after TTX, implying that Ca2+-permeable glutamate receptors are unlikely modulated during synaptic scaling. Immunohistochemical analysis revealed an increase in GluA2 puncta size and intensity in wild type, but not Rab3A KO cultures. Finally, they provide evidence that loss of Rab3A in neurons, but not astrocytes, blocks homeostatic scaling. Based on these data, the authors propose a model in which presynaptic Rab3A is required for homeostatic scaling of synaptic transmission through GluA2-dependent and independent mechanisms.

      While the title of the manuscript is mostly supported by data of solid quality, many conclusions, as well as the final model, cannot be derived from the results presented. Importantly, the results do not indicate that Rab3A modulates quantal size on both sides of the synapse. Moreover, several analysis approaches seem inappropriate.

      The following points should be addressed:

      1) The model shown in Figure 10 is not supported by the data. The authors neither provide evidence for two different functional states of Rab3A being involved in mEPSC amplitude modulation, nor for a change in glutamate content of vesicles. Furthermore, the data do not fully support the conclusion of a presynaptic role for Rab3A in homeostatic scaling.

      We will revise the model, removing presynaptic mechanisms for Rab3A and restricting it to the direct findings in this study.

      2) The analysis of mEPSC data using quantile sampling followed by ratio calculation is not meaningful under the tested experimental conditions because of the following reasons:

      (i) The analysis implicitly assumes that all events have been detected. The prominent mEPSC frequency increase after TTX suggests that this is not the case, i.e., many (small) mEPSCs are likely missed under control conditions.

      We explicitly addressed the potential contribution of missed mEPSCs that are below threshold in (Hanes et al., 2020). We found that even simulating a threshold of 7 pA, applied to data artificially modified by uniformly multiplying the control data set, did not generate a ratio plot with the increasing ratio over 75% of the data that we observe in the experimental data. Overall, the findings from simulating a threshold and a uniform multiplicative factor illustrate that the threshold issue does not cause major changes to the data. Furthermore, in cultures from Rab3A+/+ mice from the Rab3AEbd/+ colony, the mEPSC amplitudes were significantly smaller than those recorded in cultures from Rab3A+/+ mice from the Rab3A+/- colony (lines 327-329, 11 pa vs 13 pA), indicating that if there were smaller mEPSCs occurring in the Rab3A+/+ data set, we would have detected them. Although for these reasons we feel it is unlikely our ratio plot analysis is invalid, to clarify the result that homeostatic plasticity of mEPSC amplitude requires functioning Rab3A, we will remove the ratio plots.

      (ii) The analysis is used to conclude how events of a certain size are altered by TTX treatment. However, this analysis compares the smallest mEPSCs of the TTX condition with the smallest control mEPSCs, but this is not a pre-post experimental design. Variation between cells and between coverslips will markedly affect the results and lead to misleading interpretations.

      The rank order plot is a well-established plot to examine the mathematical transformation caused by homeostatic plasticity, first used in (Turrigiano et al., 1998). We included it here to facilitate comparison of our findings with previous results. We introduced the ratio plot in (Hanes et al., 2020), finding it shows more clearly differences occurring in the range of small mEPSC values. The reviewer is correct in that we are assuming the smallest mEPSCs before treatment should be matched with the smallest mEPSCs after treatment. It is almost impossible to do a pre-post experimental design for mEPSCs. Even when applying a treatment, for example acute perfusion with a receptor antagonist, to a single cell and recording mEPSCs before and after the treatment, it is not a true pre-post design at the level of mEPSC amplitudes, which come from many different inputs. The power of the method is that different characteristic mathematical transformations for different experimental conditions (e.g., genotype or activity protocol) support the idea that those conditions either involve different mechanisms or have altered the mechanism. Such differences might be missed by only comparing means or medians. However, we found no evidence that loss of Rab3A or expression of the Rab3A Earlybird mutant altered the mathematical transformation due to homeostatic plasticity, other than to reduce its magnitude across all amplitudes. Therefore, including these complex analyses is not adding anything to the finding that Rab3A plays a role in homeostatic plasticity of mEPSC amplitudes and they will be removed in the revision.

      (iii) The ratio (TTX/control) vs. control plots seem to suffer from a division by small value artifact (see Figure 6F).

      The reviewer is referring to findings on the ratio plot for receptor cluster area. Because the large ratios for the smallest control areas occur in the cultures prepared from wild type mice, and to a much lower extent in cultures prepared from Rab3A knockout mice, we think there is a biologically relevant increase in the TTX/CON ratio, since an artifact due to division by small values should be present in both data sets. However, we cannot rule out that the differences in ratio plot behavior between receptors and mEPSC amplitudes result from the different limitations in detection of receptor clusters vs. the limits of detection of mEPSCs, so we will remove the ratio plots and focus on comparison of means or medians.

      Correspondingly, ratio-analysis differs considerably for different control conditions (Fig. 1Giii, Fig. 2Giii, Fig. 6C, Fig. 9A).

      The reviewer is correct to point out that the ratio plot shows differences across control conditions (note, these differences are not obvious with the more standard rank order plot). The magnitude of the 50th percentile ratio differs across control conditions, and behaviors of the largest mEPSCs also differ, with some ratios going down at the highest control amplitudes (1Giii, 6C), and others continuing to increase with increasing control amplitude (2Giii, 9A). They all share the divergent increasing ratio from smallest mEPSC amplitude to around the 20 pA level. We attribute the differences in magnitude to the differences in experimental conditions: 1Giii is Rab3A+/+ from the +/+ colony; 1Giii is Rab3A+/+ from the Ebd/+ colony; 6C is a set of Rab3A+/+ cultures assayed several years after the set in 1Giii; 9A is a different culture condition altogether, with neurons being plated onto an already formed bed of astrocytes. Effects on the largest mEPSCs are likely attributable to the small number and high variability of amplitudes in this range. Since the variability in the very sensitive ratio plot have taken away from the main findings of homeostatic plasticity being disrupted in the absence of functioning Rab3A in neurons, we will remove the rank-order and the ratio plots from the manuscript.

      3) As noted by the authors in a previous publication (Hanes et al. 2020), statistical analysis of CDFs suffers from ninflation. In addition, the quantile sampling method chosen violates an important assumption of the K-S test. Indeed, pvalues for these comparisons are typically several orders of magnitude smaller. Given that the statistical N most likely corresponds to the number of cultures (see, e.g., https://doi.org/10.1371/journal.pbio.2005282), CDF comparisons are not informative and should thus not be used to draw conclusions from the data. The plots can be informative, though.

      As the reviewer acknowledges, we were very careful in (Hanes et al., 2020) to state that the p values could not be used to determine significance in the KS test of cumulative distributions for pooled data because the KS test assumes a single sample per cell. We also suggested in that study that the p values could be used in a comparative way for looking at data sets with similar (inflated) n values to say something about bigger or smaller differences. We failed to reiterate those caveats here. In reviewing the article “What is N” by (Lazic et al., 2018) (which we very much appreciate being shown by the reviewer), we agree that in the current study where we are attempting to show how the effect of homeostatic plasticity is or is not altered by loss of Rab3A function, it is imperative that we be able to make conclusions about statistical significance. The pooling approach is essential for having some sense of the mEPSC amplitude distributions, but that is not necessary for looking at the effect of Rab3A. Therefore, we will remove all analyses that involve pooling of multiple mEPSC amplitudes per cell.

      4) How does recoding noise and the mEPSC amplitude threshold affect "divergent scaling"?

      We addressed this in our 2020 paper (Hanes et al., 2020) where we showed that the experimental homeostatic increase in mEPSC amplitude cannot be simulated with uniform, multiplicative synaptic scaling whether we included or excluded distortion caused by a detection threshold.

      5) What is the justification for the line fits of the ratio data/how was the fit range chosen?

      We are assuming the reviewer is referring to the line fits for the rank-order data. If so, the fit range is the entire range of the data. This issue will be addressed by the removal of the rank-order plots from the manuscript.

      6) TTX application induces a significant increase in mEPSC amplitude in Rab3A-/- mice in two out of three data sets (Figs. 1 and 9). Hence, the major conclusion that Rab3A is required for homeostatic scaling is only partially supported by the data.

      Based on the p-values for comparison of means with a Kruskal-Wallis test, we would argue that TTX application does not show a significant increase in mEPSC amplitude in Rab3A-/- neurons (Figure 1 p-value = .318; Figure 9 p-value = .125) when comparing to untreated control mEPSC amplitude means. It is only when we use the KS test and the inflated n’s that we get a barely significant results, p = 0.042. Based on the Lazic article (Lazic et al., 2018), we would now conclude that we cannot use the KS p value in that analysis. We have tried to be clear that the effect of TTX application on mEPSC amplitude in Rab3A-/- neurons is not completely abolished, but rather is dramatically reduced, which we acknowledge in the manuscript (line 279). This issue will be addressed by removal of CDFs from the manuscript.

      7) Line 289: A comparison of p-values between conditions does not allow any meaningful conclusions.

      Although we feel that comparison of magnitude of effects can be stated in a qualitative way for similar sized pooled data sets with larger or smaller p-values, we agree that statistical significance has no meaning. This issue will be addressed by removing the CDF plots from the manuscript.

      8) There is a significant increase in baseline mEPSC amplitude in Rab3AEbd/Ebd (15 pA) vs. Rab3Aebd/+ (11 pA) cultures, but not in Rab3A-/- (13.6 pA) vs. Rab3A+/- (13.9 pA). Although the nature of scaling was different between Rab3AEbd/Ebd vs. Rab3AEbd/+, and Rab3AEbd/Ebd with vs. without TTX, the question arises whether the increase in mEPSC amplitude in Rab3AEbd/Ebd is Rab3A dependent. Could a Rab3A independent mechanism occlude scaling?

      We have acknowledged in the manuscript that one explanation for a failure to exhibit homeostatic plasticity in the cultures from Rab3A Earlybird mutant mice is that the already large basal amplitude occludes any further increase (line 366). In the revision we will make sure the occlusion possibility is highlighted, but we will also discuss other proteins that have been implicated in homeostatic plasticity that have caused an increase in mEPSC amplitude and/or AMPA receptors at baseline, for example, Arc/Arg3.1 KO (Shepherd et al., 2006; Beique et al., 2011); Homer KO (Hu et al., 2010) and inhibition of mir-186-5p (Silva et al., 2019).

      9) Figure 4: NASPM appears to have a stronger effect on mEPSC frequency in the TTX condition vs. control (-40% vs. 15%). A larger sample size might be necessary to draw definitive conclusions on the contribution of Ca2+-permeable AMPARs.

      We will acknowledge that Ca2+-permeable AMPARs could be contributing to the frequency increase following activity blockade and will also include analyses of frequency throughout the manuscript.

      10) The authors discuss previous papers showing changes in VGLUT1 intensity. Was VGLUT intensity altered in the stainings presented in the manuscript?

      We will perform analyses VGLUT1 intensity and include them in the manuscript.

      11) The change in GluA2 area or fluorescence intensity upon TTX treatment in controls is modest. How does the GluA2 integral change?

      The changes in GluA2 integrals look exactly like the changes in cluster size and were not included to simplify the results. But with the removal of the CDFs, rank order, and ratio plots, we can easily include integral measurements. What we did not observe was an additive effect with intensity and size such that the effects on integral were of greater magnitude or statistical significance than either alone. We will include the integral plots in the revised manuscript.

      12) The quantitative comparison between physiology and microscopy data is problematic. The authors report a mismatch in ratio values between the smallest mEPSC amplitudes and smallest GluA2 receptor cluster sizes (l. 464; Figure 8). Is this comparison affected by the fluorescence intensity threshold?

      What was the rationale for a threshold of 400 a.u. or 450 a.u.?

      We have acquired AOIs of receptor clusters at multiple threshold levels, and can examine whether the results are altered when using a low, medium or high threshold level.

      How does this threshold compare to the mEPSC threshold of 3 pA?

      The issue with values being below threshold in untreated cultures has been the concern in interpreting effects on mEPSC amplitudes, specifically, whether this mismatch contributes to divergent scaling. A problem of values being below a toohighly set threshold in the control and becoming detectable after the homeostatic plasticity produces a lower ratio than expected, because now there are values in the treated condition that were not present in the control condition. Instead, for GluA2 receptor cluster size, we observed higher TTX/CON ratios at the low end of the data set. So, based on this, the thresholds chosen for imaging are not having the same effect, if that is what is being asked. This issue will be addressed by removing ratio plots.

      The conclusion that an increase in AMPAR levels is not fully responsible for the observed mEPSC increase is mainly based on the rank-order analysis of GluA2 intensity, yielding a slope of ~0.9. There are several points to consider here: (i) GluA2 fluorescence intensity did increase on average, as did GluA2 cluster size. (ii) The increase in GluA2 cluster size is very similar to the increase in mEPSC amplitude (each approx. 18-20%). (iii) Are there any reports that fluorescence intensity values are linearly reporting mEPSC amplitudes (in this system)?

      We agree that our data show GluA2 receptors increase as based on cluster size, and did not mean to imply otherwise. Our conclusion that there is another contributor to mEPSC amplitude other than receptors is based on two main findings, 1) that the ratio plots for mEPSC amplitudes and receptor cluster size have distinctively different behaviors, and 2) that there are differences in either magnitude or direction of the TTX effect across 6 matched cultures, 3 from WT animals and 3 from TTX animals (see more explanation of this point below, in response to Reviewer 3). To our knowledge, no one has reported homeostatic plasticity effects on a culture by culture basis, and no one has compared imaging results and physiological results for the same cultures. We will remove the ratio plots and the conclusions based on the differences in behavior for mEPSC amplitudes and receptor cluster size. We will acknowledge in the revision that the differences in magnitude and direction across the 6 matched cultures could be due to the differences in limitations and artifacts of imaging fluorescent antibody staining vs. the limitations and artifacts of detecting mEPSCs electrophysiologically. However, we will continue to state that our results could also be due to the possibility that mEPSC amplitude is not changing in lockstep with receptor levels in every situation. To support this proposal, we will discuss those articles that include both measurements, and point out where mEPSC amplitude measurements and receptor levels match and where they do not.

      Antibody labelling efficiency, and false negatives of mEPSC recordings may influence the results. The latter was already noted by the authors.

      We will add the caveat that antibody labeling efficiency can vary between coverslips. Although we prepared single solutions that were applied to all coverslips in an experiment, this was not possible for the primary antibody to GluA2, which was added to live cultures in individual wells.(iv) It is not entirely clear if their imaging experiments will sample from all synapses. We will add to Materials and Methods that we sample from all the synapses that could be detected by the researcher on the primary dendrite of the pyramidal cell.

      Other AMPAR subtypes than GluA2 could contribute, as could kainate or NMDA receptors.

      This is true, other AMPARs (GluA3 and/or GluA4) could be contributing, but we only looked at the receptors well established to be contributing to homeostatic plasticity (GluA1 and GluA2). We will acknowledge the possible contribution of other AMPARs in the revised manuscript.

      Furthermore, the statement "complete lack of correspondence of TTX/CON ratios" is not supported by the data presented (l. 515ff). First, under the assumption that no scaling occurs in Rab3A-/- , the TTX/CON ratios show a 20-30% change, which indicates the variation of this readout. Second, the two examples shown in Figure 8 for Rab3A+/+ are actually quite similar (culture #1 and #2), particularly when ignoring the leftmost section of the data, which is heavily affected by the raw values approaching zero.

      We will remove the ratio plots from the manuscript and the arguments about differences between GluA2 receptors and mEPSC amplitudes that were based on them. However, we maintain that we have demonstrated a lack of consistent effect for GluA2 receptors and mEPSCs in the matched culture experiments. Yes, the readout of homeostatic plasticity in ratio plots for mEPSCs in the Rab3AKO reach over 1.1 in Figure 1, and as high a 1.2 in the cultures where Rab3AKO neurons were plated on Rab3AWT glia (Figure 9). Our point is that if we had measured GluA2 receptor responses to TTX in those same experiments, the ratios should have been above 1. However, in the experiments in which we measured both mEPSCs and GluA2 receptors, the ratios do not match. In culture #1, the ratio for mEPSCs was at 1 for more than 50% of the data, but for GluA2 receptors, was below 1 for more than 50% of the data. In culture #3, the ratio for mEPSCs was below 1 for more than 50% of the data, but for GluA2 receptors was close to 1.2 for 50% of the data. Only for culture #2 do the ratios appear to match. In the revised manuscript, the evidence that GluA2 receptors and mEPSCs are not changing in parallel will be based on the behavior of means or medians in untreated vs TTXtreated cultures, rather than ratio plots. It could be argued that we need a greater number of matched experiments to make conclusions, but the whole point of a matched experiment is that it should always show the same result—we are no longer dealing with the variability in the homeostatic plasticity itself. We will add a statement that the only three explanations left for the failure of mEPSC amplitudes and GluA2 receptors to change in parallel are 1) a true mismatch, 2) a sampling issue, or 3) technical artifacts that occur in one culture and not another.

      13) Figure 7A: TTX CDF was shifted to smaller mEPSC amplitude values in Rab3A-/- cultures. How can this be explained?

      Figure 7A depicts the pooled data that are shown separately for 3 cultures in Figure 8. We observed mEPSC amplitudes being smaller after TTX treatment in some range of the data for all three Rab3AKO cultures, suggesting that this may be a biological result rather than random variation around no change (which would be a ratio of 1). However, this effect is not significant at the level of means, nor in the KS test (which has the issue of inflated n in any case), so we did not highlight this point. This issue will be addressed by the removal of the CDF plots from the manuscript.

      Reviewer #2

      Technical concerns:

      1) The culture condition is questionable. The authors saw no NMDAR current present during spontaneous recordings, which is worrisome since NMDARs should be active in cultures with normal network activity (Watt et al., 2000; Sutton et al., 2006).

      The (Watt et al., 2000) study recorded mEPSCs in 0 Mg2+ (Figure 1). The (Sutton et al., 2006) study also shows an average mEPSC waveform (Figure 1D) that was recorded from in 0 Mg2+. Our extracellular recording solution contains Mg2+ (1.3 mM) so we likely are not observing NMDA-mediated currents because they are blocked with Mg2+ when strong depolarizations are prevented with TTX in the recording solution. We will add the idea that the NMDA currents are blocked by Mg2+ to Material and Methods.

      It is important to ensure there is enough spiking activity before doing any activity manipulation.

      We agree that it would be best if network spiking activity were monitored alongside mEPSC recordings, for example by culturing on multi-electrode arrays. Data from these measurements might explain culture to culture variability in homeostatic responses. To our knowledge, most other studies investigating homeostatic plasticity do not monitor network spiking activity in the same cultures that assay mEPSC amplitudes. This is something that the field should move towards. We will add the caveat that activity was not directly measured to the manuscript.

      Similarly, it is also unknown whether spiking activity is normal in Rab3A KO/Ebd neurons.

      Since we did not measure spiking activity, we cannot address whether the disruption in homeostatic plasticity in cultures prepared from Rab3A KO and Rab3AEbd/Ebd mutant mice is due to an alteration in network activity. If activity were already low in cultures prepared from these genetically altered mice, we would expect mEPSC amplitudes to be increased, compared to those measured in cultures from WT animals. That is not the case in cultures from Rab3A KO mice, so it is unlikely that network activity is reduced. However, mEPSC amplitudes are increased in Rab3AEbd/Ebd cultures, leaving open this possibility. It would have to be a defect unique to neurons in culture, since the Rab3AEbd/Ebd mouse appears normal in every way, suggesting action potential activity is occurring in the brains of these animals in vivo. We will add the possibility that activity is altered in the cultures from Rab3AKO and Rab3AEbd/Ebd to the manuscript.

      2) Selection of mEPSC events is not conducted in an unbiased manner. Manually selecting events is insufficient for cumulative distribution analysis, where small biases could skew the entire distribution. Since the authors claim their ratio plot is a better method to detect the uniformity of scaling than the well-established rank-order plot, it is important to use an unbiased population to substantiate this claim.

      MiniAnalysis (a standard program used for mEPSC event detection and analysis) selects many false positives with the automated feature (due to the very small sizes of events that are close to the noise level) so manual re-evaluation of the automated process is necessary to eliminate false positives. As soon as there is a manual step, bias is introduced. Interestingly, a manual reevaluation step was applied in a recent study that describes their process as ‘unbiased” (Wu et al., 2020). The alternative is to apply a very large threshold, reducing or eliminating false positives. However, this has the effect of biasing the data towards large events. In sum, we do not believe it is currently possible to perform a completely unbiased detection process. We feel that it is important to include as many small events as possible to reduce the problem of having events in the TTX experimental group that were not matched by events in the control experimental group, for the rank order and ratio plots, so setting the threshold low and manually detecting events accomplishes this. We will add to the Materials and Methods section that the person selecting events did not have information on whether the record was from an untreated or a TTX-treated cell at the time of selection. All of these issues, the potential for skewing the CDFs, and bias potentially interfering in the true rank order and ratio relationships, are addressed by removal of the CDFs, ratio and rank-order plots from the manuscript.

      3) Immunohistochemistry data analysis is problematic. The authors only labeled dendrites without doing cell-fills to look at morphology, so it is questionable how they differentiate branches from pyramidal neurons and interneurons. Since glutamatergic synapses on these two types of neuron scale in the opposite directions, it is crucial to show that only pyramidal neurons are included for analysis.

      MAP2, in addition to labeling dendrites, also labels the cell body, and we used the cell structure revealed by MAP2 staining to select pyramidal-shaped neurons. The selection of the primary dendrite of a pyramidal neuron was stated in lines 239-240 in Materials and Methods and lines 1094 in the figure legend, but we had not explicitly stated how we knew it was a pyramidal neuron. We will include a low power picture of each of the selected pyramidal neurons in the revision.

      Conceptual concerns:

      The only novel finding here is the implicated role for Rab3A in synaptic scaling, but insights into mechanisms behind this observation are lacking. The author claims that Rab3A likely regulates scaling from the presynaptic side, yet there is no direct evidence from data presented. In its current form, this study's contribution to the field is very limited.

      We acknowledge that a presynaptic mechanism is involved in the regulation of homeostatic plasticity by Rab3A is not supported by direct evidence in cortical cultures in this study. But we disagree that the study’s contribution is very limited.

      The revised manuscript will emphasize that there are only two possible mechanisms by which Rab3A is acting in homeostatic plasticity. Either this presynaptic vesicle protein is regulating postsynaptic receptors (an extremely surprising result for which we do have direct evidence), or, it is regulating quantal size from both sides of the synapse (supported by direct evidence from our previous study at the mouse neuromuscular junction in vivo, where receptors are not being upregulated during homeostatic plasticity, and, by indirect evidence in the current study, that receptors and mEPSCs are not being identically regulated in the same cultures). Furthermore, the first idea that follows from the effect of Rab3A on receptors is that it would be regulating release of factors from astrocytes, since this is a mechanism that has been shown to be involved in homeostatic plasticity, and we clearly disprove this hypothesis.

      1) Their major argument for this is that homeostatic effects on mEPSC amplitudes and GluA2 cluster sizes do not match. This is inconsistent with reports from multiple labs showing that upscaling of mEPSC amplitude and GluA2 accumulation occur side by side during scaling (Ibata et al., 2008; Pozo et al., 2012; Tan et al., 2015; Silva et al., 2019).

      We agree with the reviewer that many studies show an increase in receptors and mEPSC amplitudes after activity blockade. This is why we were very surprised in our initial experiments to find that there was not a consistent robust increase in receptors in our cultures. At that point we were only imaging, and we assumed that it was homeostatic plasticity that was not always robust. We decided it was essential to measure mEPSC amplitudes and image receptors in the same cultures. We expected to observe larger and smaller effects on mEPSC amplitudes from culture to culture that were paralleled by larger and smaller effects on receptors, but this is not what happened. We have gone back to the literature to look more closely at whether variability across cultures has ever been shown for mEPSC amplitudes, receptors, or both. In a survey of 14 studies, none report results culture by culture. To our knowledge, we are the first to report this variability in the receptor response, and the lack of correlation between mEPSC amplitudes and receptor responses, in the same cultures. That said, for the 4 examples provided by the reviewer, only 1 reports evidence relevant to our study that receptors and mEPSC amplitudes ‘occur side by side,’ which is the (Ibata et al., 2008) study. Here, 24 hr of TTX treatment of rat cortical cultures causes synaptically localized GluA2 receptors in confocal imaging, and mEPSC amplitudes, to both increase to around 130%. The (Pozo et al., 2012) study is not a study of activity blockade but of the effects of overexpressing beta-integrins in rat hippocampal cultures, and this causes both GluA2 receptors and mEPSC amplitudes to increase, but the GluA2 level is not restricted to synaptic sites, and, is expressed as the surface fraction (surface receptor/total receptor—total receptor being surface intensity plus internalized intensity) which increases from 0.5 to 0.55, or to 110%, while mEPSC amplitude increases to ~180%. The (Tan et al., 2015) study only provides Western blot data to show an increase of receptors to 125% in mouse cortical cultures in response to 48 hr TTX, with mEPSC amplitudes increased to ~140%, but the Western blot technique measures synaptic and nonsynaptic receptors on excitatory and inhibitory neurons, as well as receptors on astrocytes. Finally, in (Silva et al., 2019), the culture conditions for the imaging data and the mEPSC amplitude data are markedly different, with ‘low-density’ Banker cultures being used for the former, and ‘high-density’ cultures used for the latter, and the protocol to induce activity blockade is different from ours (noncompetitive AMPA and NMDA blockers); synaptic GluA2 receptors are increased to ~280% and mEPSC amplitudes to ~170%. In the revision we will carefully summarize the previous evidence for receptors and mEPSC amplitude responses to activity blockade. Since it is known that different protocols trigger different molecular mechanisms, for example, TTX + APV triggers a homeostatic plasticity that can be completely reversed by acute application of blockers of Ca-permeable receptors, whereas TTX alone triggers a plasticity that is insensitive to these blockers (Sutton et al., 2006), Figure 4E; (Soden and Chen, 2010); Figure 4A), we will keep our discussion restricted to studies using TTX alone for at least 24 hr. We will acknowledge that our finding that GluA2 receptors and mEPSC amplitudes are not varying in lockstep from culture to culture suggests there is another contributor to mEPSC amplitude, but that we cannot rule out it is due to a greater variability in signal, or more issues with signal over noise, in imaging experiments compared to electrophysiology experiments.

      Studies surveyed about reporting results by culture:

      (Ju et al., 2004; Stellwagen et al., 2005; Shepherd et al., 2006; Sutton et al., 2006; Cingolani and Goda, 2008; Hou et al., 2008; Ibata et al., 2008; Chang et al., 2010; Hu et al., 2010; Jakawich et al., 2010; Beique et al., 2011; Tatavarty et al., 2013; Diering et al., 2014; Sanderson et al., 2018)

      Further, because the acquisition and quantification methods for mEPSC recordings and immunohistochemistry imaging are entirely different (each with its own limitations in signal detection), it is not convincing that the lack of proportional changes must signify a presynaptic component.

      We agree with the reviewer that there is no way to compare absolute levels from one type of experimental technique to another, but whatever differences in technical issues there are for the two techniques, they should cause systemic errors and should not contribute to the differences between experiments. Most of the issues with imaging come down to variability in the intensity of fluorescence from experiment to experiment, since the antibody solutions are made anew each time, as is the fixation solution. In addition, the confocal microscope function can vary over time and give brighter or dimmer images. But those kinds of artifacts are addressed by using the same solutions on control and TTX-treated coverslips, and imaging control and TTX-treated coverslips in the same single 2-3 hour imaging session, so that whatever issues there are, they cannot contribute to the TTX effect itself. Therefore when we compare the TTX effect (TTX measurements compared to untreated measurements) from culture to culture and find that in one WT culture there was no increase in receptors but there was in mEPSC amplitude, it is difficult to explain how a limitation specific to the antibody imaging technique could produce such a result. Similarly, when we get the opposite result, that in one KO culture, receptors increased but mEPSC amplitudes did not, it is unclear how limitations in signal detection would produce such a result in one culture but not another. The one exception to this is that the primary GluA2 antibody has to be added individually to each coverslip before returning the dishes to the incubator in order to avoid the disruption to live cells that a complete removal of media would have had. The only remaining ‘artifact’ that could explain the results would be a greater variability in the imaging experiments due to limitations in the signal or the signal to noise ratio. In the revision we will report additional characteristics of imaging experiments, such as average intensity for each coverslip, and for each experiment, to address whether variability in fluorescence levels could explain the variability in TTX effects we observe. We will include the possibility that the mismatches in GluA2 receptors and mEPSCs could be caused by greater variability in the imaging experiments.

      2) The authors also speculate in the discussion that presynaptic Rab3A could be interacting with retrograde BDNF signaling to regulate postsynaptic AMPARs. Without data showing Rab3A-dependent presynaptic changes after TTX treatment, this argument is not compelling. In this retrograde pathway, BDNF is synthesized in and released from dendrites (Jakawich et al., 2010; Thapliyal et al., 2022), and it is entirely possible for postsynaptic Rab3A to interfere with this process cell-autonomously.

      In the revision, the model will focus on the direct findings of the manuscript and tone down the speculation about BDNF signaling, but in the Discussion we will add the possibility that a Rab3A-BDNF interaction could occur either presynaptically or postsynaptically. Interestingly, these articles suggest the postsynaptic BDNF is affecting presynaptic function, namely mEPSC frequency. It is conceivable it could presynaptically affect the vesicle’s release of transmitter.

      3) The authors propose that a change in AMPAR subunit composition from GluA2-containing ones to GluA1 homomers may account for the distinct changes in mEPSC amplitudes and GluA2 clusters. However, their data from the Naspm wash-in experiments clearly show that GluA1 homomer contributions have not changed before and after TTX treatment.

      Our apologies to the reviewer that we were not clear on this point. In lines 396 to 400 we were describing the significant effects that NASPM had on mEPSC frequency on both untreated and TTX-treated cells, despite having only modest, and not quite significant effects on mEPSC amplitude. We conclude from these results that there are synaptic sites that have only GluA1 homomers, and the mEPSCs from these sites are blocked 100% by NASPM. There may be an increase in such GluA1-only synapses after activity blockade, but nevertheless, these events do not contribute to the amplitude increase. So we did not mean to suggest that there is a shift from Glua2 containing to GluA1 containing receptors that leads to the amplitude increase and fully agree with the reviewer that the GluA1 homomer contributions to amplitude have not changed before and after TTX. We will clarify the difference between the contribution of GluA1 homomers to amplitude and frequency in the revised manuscript.

      Reviewer #3

      Summary: The authors clearly demonstrate the Rab3A plays a role in HSP at excitatory synapses, with substantially less plasticity occurring in the Rab3A KO neurons. There is also no apparent HSP in the Earlybird Rab3A mutation, although baseline synaptic strength seems already elevated. In this context, it is unclear if the plasticity is absent or just occluded by a ceiling effect due the synapses already being strengthened. The authors do appropriately discuss both options. There are also differences in genetic background between the Rab3A KO and Earlybird mutants that could also impact the results, which are also noted. The authors have solid data showing that Rab3A is unlikely to be active in astrocytes, Finally, they attempt to study the linkage between synaptic strength during HSP and AMPA receptor trafficking, and conclude that trafficking is largely not responsible for the changes in synaptic strength.

      Strengths: This work adds another player into the mechanisms underlying an important form of synaptic plasticity. The plasticity is only reduced, suggesting Rab3A is only partially required and perhaps multiple mechanisms contribute. The authors speculate about some possible novel mechanisms.

      Weaknesses: However, the rather strong conclusions on the dissociation of AMPAR trafficking and synaptic response are made from somewhat weaker data. The key issue is the GluA2 immunostaining in comparison with the mESPC recordings. Their imaging method involves only assessing puncta clearly associated with a MAP2 labeled dendrite. This is a small subset of synapses, judging from the sample micrographs (Fig 5). To my knowledge, this is a new and unvalidated approach that could represent a particular subset of synapses not representative of the synapses contributing to the mEPSC change. (they are also sampling different neurons for the two measurements; an additional unknown detail is how far from the cell body were the analyzed dendrites for immunostaining. While the authors acknowledge that a sampling issue could explain the data, they still use this data to draw strong conclusions about the lack of AMPAR trafficking contribution to the mEPSC amplitude change. This apparent difference may be a methodological issue rather than a biological one, and at this point it is impossible to differentiate these. It will unfortunately be difficult to validate their approach. Perhaps if they were to drive NMDA-dependent LTD or chemLTP, and show alignment of the imaging and ephys, that would help. More helpful would be recordings and imaging from the same neurons but this is challenging. Sampling from identified synapses would of course be ideal, perhaps from 2P uncaging combined with SEP-labeled AMPARs, but this is more challenging still. But without data to validate the method, it seems unwarranted to make such strong conclusions such as that AMPAR trafficking does not underlie the increase in mEPSC amplitude, given the previous data supporting such a model.

      We chose the primary dendrite to ensure we were not assaying dendrites from inhibitory neurons or on axons, but we will add in the revision that it is a limitation of our methods that we are not sampling all the synapses for each neuron. The majority of previous studies that establish that receptors are increased side by side with mEPSCs did not measure receptors and mEPSCs in the same cells, nor even in the same cultures. There is a recent study which employs dual recordings, transfection of GluA2 and VGlut1 constructs, and infusion of dyes to highlight cell morphology (Letellier et al., 2019), so in principle an experiment could be done in which synaptic GluA2 sites are imaged in a cell in which the mEPSCs are also measured. It would be difficult to make these measurements in the same cells before and after TTX treatment, since there is a high likelihood of damaging the cell upon electrode withdrawal and with the imaging process itself. In theory, only a few such experiments would be necessary to establish whether receptors and mEPSC amplitudes are varying in lockstep, and we will consider this for a future study. As stated in response to conceptual concern #1 in Reviewer 2’s comments, we will review the literature on previous studies’ demonstrations of increases in receptors and mEPSC amplitudes following activity blockade in more detail, including how the synaptic sites to be imaged were chosen, to address whether our selection of sites touching the primary dendrite is unvalidated.

      A sample from 3 articles:

      (Ibata et al., 2008), only information is that ‘distal dendrites’ were examined. The authors do not use a dendritic label. (Jakawich et al., 2010), ‘neurons with pyramidal-like morphology were selected for imaging,’ and ‘principal dendrite of each neuron was linearized’—but how these were identified is not clear, since MAP2 or other cellular labels are not described.

      (Silva et al., 2019), ‘dendrites with similar thickness and appearance were randomly selected using MAP2 staining,’ which suggests synaptic sites with GluA2 and VGLUT1 were selected on the basis of being close to or touching the MAP2 positive dendrite, although this is not stated explicitly.

      We can perform length measurements on the dendrites imaged and report this information in the revision, but the primary dendrite is the closest dendrite to the cell body.

      We have addressed the potential contribution of technical artifacts arising from the two distinct methods of measurement, imaging and electrophysiology, in our response to conceptual concern #1 of Reviewer 2.

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a frequency effect that is quite unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. It is also unclear why the authors argue this proves that the NASPM was at an effective concentration (lines 399-400).

      We observed a clear effect of NASPM reducing mEPSC frequency. We will state more clearly that we infer from the loss of mEPSCs after NASPM that such mEPSCs were from synaptic sites that had only GluA1 homomers, and acknowledge that this is an interpretation. We will also clarify that if our inference is correct, it would indicate that the dose of NASPM we used was 100% effective at blocking GluA1 homomers. The alternative explanation would be a presynaptic effect of NASPM, which has never been reported, to our knowledge.

      Further, the amplitude data show a strong trend towards smaller amplitude. The p value for both control and TTX neurons was 0.08 - it is very difficult to argue that there is no effect. And the decrease is larger in the TTX neurons. Considering the strong claims for a pre-synaptic and the use of this data to justify only looking at GluA2 by immunostaining, these data do not offer much support of the conclusions. Between the sampling issues and perhaps looking at the wrong GluA subunit, it seems premature to argue that trafficking is not a contributor to the mEPSC amplitude change, especially given the substantial support for that hypothesis. Further, even if trafficking is not the major contributor, there could be shifts in conductance (perhaps due to regulation of auxiliary subunits) that does not necessitate a pre-synaptic locus. While the authors are free to hypothesize such a mechanism, it would be prudent to acknowledge other options and explanations.

      We did not mean to suggest that there is no effect of NASPM on mEPSC amplitude. We will clarify that our data indicate that there is no effect of NASPM on the TTX effect on mEPSC amplitude. We agree with the reviewer that the effect of NASPM on frequency is of larger magnitude after TTX treatment, although the p value is larger than that for untreated cells, likely due to greater variability. We interpret this to mean that TTX treatment increases the proportion of synapses that have only GluA1 homomers. Nevertheless, the increase in GluA1 homomer sites does not appear to contribute to the overall increase in amplitude following TTX treatment, and we wanted to find the mechanism of the amplitude increase. That is why we focused on GluA2 receptors. We will acknowledge the limitation of basing our conclusions on only GluA2 receptors in the revision, as well as the possibility that there is a change in conductance. As stated in our response to Reviewer 2, we do not mean to state that GluA2 receptors do not go up after activity blockade, we find that this is the case. We are proposing an additional mechanism contributing to mEPSC amplitude to explain the different responses for GluA2 receptors vs. mEPSC amplitudes in some of the 6 matched experiments (3 WT and 3 KO).

      The frequency data are missing from the paper, with the exception of the NASPM dataset. The mEPSC frequencies should be reported for all experiments, particularly given that Rab3A is generally viewed as a pre-synaptic protein regulating release. Also, in the NASPM experiments, the average frequency is much higher in the TTX treated cultures. Is this statistically above control values?

      We will report frequency measurements for all experiments shown. Following TTX treatment, frequency variability increases enormously, with cells having as high as > 10 mEPSCs per second, and other TTX-treated cells with frequencies as low as < 1 mEPSC per second, so the TTX effect on frequency, and whether this effect is present or not in Rab3A KO and Rab3AEbd/Ebd is not completely clear, which is why we did not include those results previously.

      Unaddressed issues that would greatly increase the impact of the paper:

      1) Is Rab3A acting pre-synaptically, post-synaptically or both? The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where it is acting (pre or post) would aid substantially in understanding its role (and particularly the hypothesized and somewhat novel idea that the amount of glutamate released per vesicle is altered in HSP). They could use sparse knock-down of Rab3A, or simply mix cultures from KO and WT mice (with appropriate tags/labels). The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. The more support for their suggestion of a pre-synaptic site of control, the better.

      We agree with the reviewer that this is the most important question to answer next. The approach suggested by the reviewer would be to record from Rab3A KO neurons in a culture where the majority of its inputs are Rab3A positive. If the TTX effect is absent from these cells, it would strongly indicate that postsynaptic Rab3A is required for homeostatic plasticity. There are not currently transgenic mice expressing GFP forms of Rab3A, so we would have to create one, or, transiently transfect Rab3A-GFP into Rab3AKO neurons. Given that under our experimental conditions, we require a very high density of neurons to observe the increase in mEPSC amplitude, it would be difficult to get the ratio of Rab3A-expressing neurons high enough using transfection to be sure that a given postsynaptic cell lacking Rab3A had a normal number of Rab3A-positive inputs and almost no Rab3A-negative inputs. It may be that the opposite experiment is more doable—an isolated Rab3A-positive neuron in a sea of Rab3A-negative neurons, which could be accomplished with a very low transfection efficiency. Another approach would be to use the fast off rate antagonist gamma-DGG, which is more effective against low glutamate concentrations than high glutamate concentrations (see (Liu et al., 1999; Wu et al., 2007). If gamma-DGG were less effective at reducing mEPSC amplitude in TTX-treated cells, compared to untreated cells, it would support the hypothesis that activity blockade leads to an increase in the amount of transmitter per vesicle fusion event. Further, if the change in gamma-DGG sensitivity after activity blockade were disrupted in cultures from Rab3A KO cells, it would support a presynaptic role for Rab3A in homeostatic plasticity of mEPSC amplitude. We have begun these experiments but are finding the surprising result that within a single recording, small mEPSCs and large mEPSCs appear to be differentially sensitive to gamma-DGG. To confirm that this is a biological characteristic, rather than an issue with the detection threshold, we will be repeating our experiments with a slow off rate antagonist that has same effect regardless of transmitter concentration. The complexity of these results precludes including them in the current manuscript.

      2) Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs and/or a decrease of GABA-packaging in vesicles (ie the opposite of whatever is happening at excitatory synapses). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling; an effect only at excitatory synapses would argue for a more specific role just at these synapses.

      The next question, after it is determined where Rab3A is acting, is whether it is required for other forms of homeostatic plasticity. This includes plasticity of GABA mIPSCs on pyramidal neurons, but also mEPSCs on inhibitory neurons, and, the downscaling of mEPSCs (and upscaling of mIPSCs) when activity is increased, by bicuculline for example. We will add a statement about future experiments examining other forms of plasticity to the discussion, and include examples where a molecular mechanism has mediated multiple forms, and those that have been shown to be very specific.

      Beique JC, Na Y, Kuhl D, Worley PF, Huganir RL (2011) Arc-dependent synapse-specific homeostatic plasticity. Proc Natl Acad Sci U S A 108:816-821.

      Chang MC, Park JM, Pelkey KA, Grabenstatter HL, Xu D, Linden DJ, Sutula TP, McBain CJ, Worley PF (2010) Narp regulates homeostatic scaling of excitatory synapses on parvalbumin-expressing interneurons. Nat Neurosci 13:1090-1097.

      Cingolani LA, Goda Y (2008) Differential involvement of beta3 integrin in pre- and postsynaptic forms of adaptation to chronic activity deprivation. Neuron Glia Biol 4:179-187.

      Diering GH, Gustina AS, Huganir RL (2014) PKA-GluA1 coupling via AKAP5 controls AMPA receptor phosphorylation and cell-surface targeting during bidirectional homeostatic plasticity. Neuron 84:790-805.

      Hanes AL, Koesters AG, Fong MF, Altimimi HF, Stellwagen D, Wenner P, Engisch KL (2020) Divergent Synaptic Scaling of Miniature EPSCs following Activity Blockade in Dissociated Neuronal Cultures. J Neurosci 40:4090-4102.

      Hou Q, Zhang D, Jarzylo L, Huganir RL, Man HY (2008) Homeostatic regulation of AMPA receptor expression at single hippocampal synapses. Proc Natl Acad Sci U S A 105:775-780.

      Hu JH, Park JM, Park S, Xiao B, Dehoff MH, Kim S, Hayashi T, Schwarz MK, Huganir RL, Seeburg PH, Linden DJ, Worley PF (2010) Homeostatic scaling requires group I mGluR activation mediated by Homer1a. Neuron 68:1128-1142.

      Ibata K, Sun Q, Turrigiano GG (2008) Rapid synaptic scaling induced by changes in postsynaptic firing. Neuron 57:819826.

      Jakawich SK, Nasser HB, Strong MJ, McCartney AJ, Perez AS, Rakesh N, Carruthers CJ, Sutton MA (2010) Local presynaptic activity gates homeostatic changes in presynaptic function driven by dendritic BDNF synthesis. Neuron 68:1143-1158.

      Ju W, Morishita W, Tsui J, Gaietta G, Deerinck TJ, Adams SR, Garner CC, Tsien RY, Ellisman MH, Malenka RC (2004) Activity-dependent regulation of dendritic synthesis and trafficking of AMPA receptors. Nat Neurosci 7:244-253.

      Lazic SE, Clarke-Williams CJ, Munafo MR (2018) What exactly is 'N' in cell culture and animal experiments? PLoS Biol 16:e2005282.

      Liu G, Choi S, Tsien RW (1999) Variability of neurotransmitter concentration and nonsaturation of postsynaptic AMPA receptors at synapses in hippocampal cultures and slices. Neuron 22:395-409.

      Pozo K, Cingolani LA, Bassani S, Laurent F, Passafaro M, Goda Y (2012) beta3 integrin interacts directly with GluA2 AMPA receptor subunit and regulates AMPA receptor expression in hippocampal neurons. Proc Natl Acad Sci U S A 109:1323-1328.

      Sanderson JL, Scott JD, Dell'Acqua ML (2018) Control of Homeostatic Synaptic Plasticity by AKAP-Anchored Kinase and Phosphatase Regulation of Ca(2+)-Permeable AMPA Receptors. J Neurosci 38:2863-2876.

      Shepherd JD, Rumbaugh G, Wu J, Chowdhury S, Plath N, Kuhl D, Huganir RL, Worley PF (2006) Arc/Arg3.1 mediates homeostatic synaptic scaling of AMPA receptors. Neuron 52:475-484.

      Silva MM, Rodrigues B, Fernandes J, Santos SD, Carreto L, Santos MAS, Pinheiro P, Carvalho AL (2019) MicroRNA186-5p controls GluA2 surface expression and synaptic scaling in hippocampal neurons. Proc Natl Acad Sci U S A 116:5727-5736.

      Soden ME, Chen L (2010) Fragile X protein FMRP is required for homeostatic plasticity and regulation of synaptic strength by retinoic acid. J Neurosci 30:16910-16921. Stellwagen D, Beattie EC, Seo JY, Malenka RC (2005) Differential regulation of AMPA receptor and GABA receptor trafficking by tumor necrosis factor-alpha. J Neurosci 25:3219-3228.

      Sutton MA, Ito HT, Cressy P, Kempf C, Woo JC, Schuman EM (2006) Miniature neurotransmission stabilizes synaptic function via tonic suppression of local dendritic protein synthesis. Cell 125:785-799.

      Tan HL, Queenan BN, Huganir RL (2015) GRIP1 is required for homeostatic regulation of AMPAR trafficking. Proc Natl Acad Sci U S A 112:10026-10031.

      Tatavarty V, Sun Q, Turrigiano GG (2013) How to scale down postsynaptic strength. J Neurosci 33:13179-13189.

      Turrigiano GG, Leslie KR, Desai NS, Rutherford LC, Nelson SB (1998) Activity-dependent scaling of quantal amplitude in neocortical neurons. Nature 391:892-896.

      Wang X, Wang Q, Yang S, Bucan M, Rich MM, Engisch KL (2011) Impaired activity-dependent plasticity of quantal amplitude at the neuromuscular junction of Rab3A deletion and Rab3A earlybird mutant mice. J Neurosci 31:3580-3588.

      Watt AJ, van Rossum MC, MacLeod KM, Nelson SB, Turrigiano GG (2000) Activity coregulates quantal AMPA and NMDA currents at neocortical synapses. Neuron 26:659-670.

      Wu XS, Xue L, Mohan R, Paradiso K, Gillis KD, Wu LG (2007) The origin of quantal size variation: vesicular glutamate concentration plays a significant role. J Neurosci 27:3046-3056.

      Wu YK, Hengen KB, Turrigiano GG, Gjorgjieva J (2020) Homeostatic mechanisms regulate distinct aspects of cortical circuit dynamics. Proc Natl Acad Sci U S A 117:24514-24525.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      1) The authors should better review what we know of fungal Drosophila microbiota species as well as the ecology of rotting fruit. Are the microbiota species described in this article specific to their location/setting? It would have been interesting to know if similar species can be retrieved in other locations using other decaying fruits. The term 'core' in the title suggests that these species are generally found associated with Drosophila but this is not demonstrated. The paper is written in a way that implies the microbiota members they have found are universal. What is the evidence for this? Have the fungal species described in this paper been found in other studies? Even if this is not the case, the paper is interesting, but there should be a discussion of how generalizable the findings are.

      The reviewer inquires as to whether the microbial species described in this article are ubiquitously associated with Drosophila or not. Indeed, most of the microbes described in this manuscript are generally recognized as species associated with Drosophila spp. For example, species such as Hanseniaspora uvarum, Pichia kluyveri, and Starmerella bacillaris have been detected in or isolated from Drosophila spp. collected in European countries as well as the United States and Oceania (Chandler et al., 2012; Solomon et al., 2019). As for the bacteria, species belonging to the genera Pantoea, Lactobacillus, Leuconostoc, and Acetobacter have also previously been detected in wild Drosophila spp. (Chandler et al., 2011). These elucidations will be incorporated into our revised manuscript.

      Nevertheless, the term “core” in the manuscript title may lead to misunderstanding, as the generality does not ensure the ubiquitous presence of these microbial species in every individual fly. Considering this point, we will replace the term with an expression more appropriate to our context.

      2) Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild? Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild?

      The reviewer asked whether the microbial species identified in the fermented banana samples were derived from flies. To address this question, additional experiments under more controlled conditions, such as the inoculation of specific species of wild flies onto fresh bananas, would be needed. Nevertheless, the microbes may potentially originate from wild flies, as supported by the literature cited in our response to the Weakness 1).

      Alternative sources for microbial provenance also merit consideration. For example, microbial entities may be inherently present in unfermented bananas through the infiltration of peel injuries (lines 1141-1142 of the original manuscript). In addition, they could be introduced by insects other than flies, given that both rove beetles (Staphylinidae) and sap beetles (Nitidulidae) were observed in some of the traps. These possibilities will be incorporated into the 'MATERIALS AND METHODS' and 'DISCUSSION' sections of our revised manuscript.

      Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Our sampling strategy was designed to target not only D. melanogaster but also other domestic Drosophila species, such as D. simulans, that inhabit human residential areas. After adult flies were caught in each trap, we identified the species as shown in Table S1, thereby showing the presence of either or both D. melanogaster and D. simulans. We will provide these descriptions in MATERIALS AND METHODS and DISCUSSION.

      3) Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning. The authors described their microarray data in terms of fed/starved in relation to the Finke article. They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning.

      Regarding the antimicrobial peptide genes, statistical comparisons of our RNA-seq data across different conditions were impracticable because most of them showed low expression levels (refer to Author response table 1, which exhibits the RNA-seq data of the yeast-fed larvae; similar expression profiles were observed in the bacteria-fed larvae). While a subset of genes exhibited significantly elevated expression in the non-supportive conditions relative to the supportive ones, this can be due to intra-sample variability rather than due to distinct nutritional environments. Therefore, it would be difficult to discuss a change in immune genes in the paper. Additionally, the previous study that conducted larval microarray analysis (Zinke et al., 2002) did not explicitly focus on immune genes.

      Author Response Table 1.

      Antimicrobial peptide genes are not up-regulated by any of the microbes Antimicrobial peptides gene expression profiles of whole bodies of first-instar larvae fed on yeasts. TPM values of all samples and comparison results of gene expression levels in the larvae fed on supportive and non-supportive yeasts are shown. Antibacterial peptide genes mentioned in Hanson and Lemaitre, 2020 are listed. NA or na, not available.

      They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      We did not observe significant differences between species within bacteria or fungi, or between bacteria and fungi. For example, the gene expression profiles of larvae fed on the various supporting microbes showed striking similarities to each other, as evidenced by the heat map showing the expression of all genes detected in larvae fed either yeast or bacteria (Author response image 1). Similarities were also observed among larvae fed on distinct non-supporting microbes.

      Author response image 1.

      Gene expression profiles of larvae fed on the various supporting microbes show striking similarities to each other. Heat map showing the gene expression of the first-instar larvae that fed on yeasts or bacteria. Freshly hatched germ-free larvae were placed on banana agar inoculated with each microbe and collected after 15 h feeding to examine gene expression of the whole body. Note that data presented in Figures 3A and 4C in the original manuscript, which are obtained independently, are combined to generate this heat map. The labels under the heat map indicate the microbial species fed to the larvae, with three samples analyzed for each condition. The lactic acid bacteria (“LAB”) include Lactiplantibacillus plantarum and Leuconostoc mesenteroides, while the lactic acid bacterium (“AAB”) represents Acetobacter orientalis. “LAB + AAB” signifies mixtures of the AAB and either one of the LAB species. The asterisk in the label highlights a sample in a “LAB” condition (Leuconostoc mesenteroides), which clustered separately from the other “LAB” samples. Brown abbreviations of scientific names are for the yeast-fed conditions. H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; M. asi, Martiniozyma asiatica; S. cra, Saccharomycopsis crataegensis; P. klu, Pichia kluyveri; S. bac, Starmerella bacillaris; S. cer, S. cerevisiae BY4741 strain.

      Only a handful of genes showed different expression patterns between larvae fed on yeast and those fed on bacteria, without any enrichment for specialized gene functions. Thus, it is challenging to discuss the potential differential impacts, if any, of yeast and bacteria on larval growth.

      4) The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)? Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)?

      Although we did not investigate the microbiota in the gut of either larvae or adults, we did compare the microbiota within surface-sterilized larvae or adults with those in food samples. We found that adult flies and early-stage food sources, as well as larvae and late-stage food sources, harbor similar microbial species (Figure 1F). Additionally, previous examinations of the gut microbiota in wild adult flies have identified microbial species or taxa congruent with those we isolated from our foods (Chandler et al., 2011; Chandler et al., 2012). We have elaborated on this in our response to Weakness 1).

      While we did not investigate whether these species are capable of establishing a niche in the cardia of adults, we will cite the study by Dodge et al., 2023 in our revised manuscript and discuss the possibility that predominant microbes in adult flies may show a propensity for colonization.

      Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The reviewer inquires whether the supportive microbes in our study stimulate gut Imd signaling pathways and induce the expression of digestive protease genes, as demonstrated in a previous study (Erkosar et al., 2015). According to our RNA-seq data, it seems unlikely that the supportive microbes stimulate the signaling pathway. Figures contained in Author response image 2 provide the statistical comparisons of expression levels for seven protease genes between the supportive and the non-supportive conditions. These genes did not exhibit a consistent upregulation in the presence of the supportive microbes (H. uva or K. hum in Author response image 2A; Le mes + A. ori in Author response image 2B). Rather, they exhibited a tendency to be upregulated under the non-supportive microbes (St. bac or Pi. klu in Author response image 2A; La. pla in Author response image 2B).

      Author response image 2.

      Most of the peptidase genes reported by Erkosar et al., 2015 are more highly expressed under the non-supportive conditions than the supportive conditions. Comparison of the expression levels of seven peptidase genes derived from the RNA-seq analysis of yeast-fed (A) or bacteria-fed (B) first-instar larvae. A previous report demonstrated that the expression of these genes is upregulated upon association with a strain of Lactiplantibacillus plantarum, and that the PGRP-LE/Imd/Relish signaling pathway, at least partially, mediates the induction (Erkosar et al., 2015). H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; P. klu, Pichia kluyveri; S. bac, Starmerella bacillaris; La. pla, Lactiplantibacillus plantarum; Le. mes, Leuconostoc mesenteroides; A. ori, Acetobacter orientalis; ns, not significant.

      Reviewer #2 (Public Review):

      Weaknesses:

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas. Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation. Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas.

      The reviewer asks whether the isolated microbes were colonized in the larval gut. Previous studies on microbial colonization associated with Drosophila have predominantly focused on adults (Pais et al. PLOS Biology, 2018), rather than larval stages. Developing larvae continually consume substrates which are already subjected to microbial fermentation and abundant in live microbes until the end of the feeding larval stage. Therefore, we consider it difficult to discuss microbial colonization in the larval gut. We will add this point in the DISCUSSION of the revised manuscript.

      Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation.

      While recognizing the importance of comprehensive mechanistic analysis, this study includes all experimentally feasible data. Elucidation of more detailed molecular mechanisms lies beyond the scope of this study and will be the subject of future research.

      Regarding the nutritional role of BCAAs, the incorporation of BCAAs enabled larvae fed with the non-supportive yeast to grow to the second instar. This observation suggests that consumption of BCAAs upregulates diverse genes involved in cellular growth processes in larvae. We have discussed the hypothetical interaction between lactic acid bacteria (LAB) and acetic acid bacteria (AAB) in the manuscript (lines 402-405): LAB may facilitate lactate provision to AAB, consequently enhancing the biosynthesis of essential nutrients such as amino acids. To test this hypothesis, future experiments will include the supplementation of lactic acid to AAB culture plates and the co-inoculating LAB mutant strains defective in lactate production with AABs, to assess both larval growth and continuous larval association with AABs. With respect to AAB-yeast interactions, metabolites released from yeast cells might benefit AAB growth, and this possibility will be investigated through the supplementation of AAB culture plates with candidate metabolites identified in the cell suspension supernatants of the late-stage yeasts.

      Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      We appreciate the reviewer's recommendations and will include additional descriptions regarding these aspects in the DISCUSSION section.

      Reviewer #3 (Public Review):

      Weaknesses:

      Despite describing important findings, I believe that a more thorough explanation of the experimental setup and the steps expected to occur in the exposed diet over time, starting with natural "inoculation" could help the reader, in particular the non-specialist, grasp the rationale and main findings of the manuscript. When exactly was the decision to collect early-stage samples made? Was it when embryos were detected in some of the samples? What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects? Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source. Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used? Were standard curves produced? Were internal, deuterated controls used?

      When exactly was the decision to collect early-stage samples made? Was it when embryos were detected in some of the samples?

      We collected traps and early-stage samples 2.5 days after setting up the traps. This time frame was determined by pilot experiments. A shorter collection time resulted in a greater likelihood of obtaining no-fly traps, whereas a longer collection time caused larval overcrowding, as well as adults’ deaths from drowning in the liquid seeping out of fruits. These procedural details will be delineated in the MATERIALS AND METHODS section of the revised manuscript.

      What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects?

      We assume that the origins of the microbes detected in the no-fly trap foods vary depending on the species. For instance, Colletotrichum musae, the fungus that causes banana anthracnose, may have been present in fresh bananas before trap placement. The filamentous fungi could have originated from airborne spores, but they could also have been introduced by insects that feed on these fungi. We will include these possibilities in the DISCUSSION section of the revised manuscript.

      Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source.

      We are grateful for the reviewer's insightful suggestions regarding shifts in the adult microbiome. We plan to include in the DISCUSSION section of the revised manuscript the possibility that the microbial composition may change substantially during pupal stages and that microbes obtained after eclosion could potentially form the adult gut microbiota.

      Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used? Were standard curves produced? Were internal, deuterated controls used?

      We appreciate the reviewer's advice. Detailed methods of the metabolomic experiments will be included in our revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      People can perform a wide variety of different tasks, and a long-standing question in cognitive neuroscience is how the properties of different tasks are represented in the brain. The authors develop an interesting task that mixes two different sources of difficulty, and find that the brain appears to represent this mixture on a continuum, in the prefrontal areas involved in resolving task difficulty. While these results are interesting and in several ways compelling, they overlap with previous findings and rely on novel statistical analyses that may require further validation.

      Strengths

      1) The authors present an interesting and novel task for combining the contributions of stimulus-stimulus and stimulus-response conflict. While this mixture has been measured in the multi-source interference task (MSIT), this task provides a more graded mixture between these two sources of difficulty

      2) The authors do a good job triangulating regions that encoding conflict similarity, looking for the conjunction across several different measures of conflict encoding

      3) The authors quantify several salient alternative hypothesis and systematically distinguish their core results from these alternatives

      4) The question that the authors tackle is of central theoretical importance to cognitive control, and they make an interesting an interesting contribution to this question

      We would like to thank the reviewer for the positive evaluation of our manuscript and the constructive comments and suggestions. Your feedback has been invaluable in our efforts to enhance the accessibility of our manuscript and strengthen our findings. In response to your suggestion, we reanalyzed our data using the approach proposed by Chen et al.’s (2017, NeuroImage) and applied stricter multiple comparison correction thresholds in our reporting. This reanalysis largely replicated our previous results, thereby reinforcing the robustness of our findings. We also have examined several alternative models and results supported the integration of the spatial Stroop and Simon conflicts within the cognitive space. In addition, we enriched the theoretical framework of our manuscript by connecting the cognitive space with other important theories such as the “Expected Value of Control” theory. We have incorporated your feedback, revisions and additional analyses into the manuscript. As a result, we firmly believe that these changes have significantly improved the quality of our work. We have provided detailed responses to your comments below.

      1) It's not entirely clear what the current task can measure that is not known from the MSIT, such as the additive influence of conflict sources in Fu et al. (2022), Science. More could be done to distinguish the benefits of this task from MSIT.

      We agree that the MSIT task incorporates Simon and Eriksen Flanker conflict tasks and can efficiently detect the additivity of conflict effects across orthogonal tasks. Like the MSIT, our task incorporates Simon with spatial Stroop conflicts and can test the same idea. For example, a previous study from our lab (Li et al., 2014) used the combined spatial Stroop-Simon condition with the arrows displayed on diagonal corners and found evidence for the additive hypothesis. However, the MSIT cannot be used to test whether/how different conflicts are parametrically represented in a low-dimensional space, a question that is important to address the debate of domain-general and domain-specific cognitive control.

      To this end, our current study adopted the spatial Stroop-Simon task for the unique purpose of parametrically modulating conflict similarity. As far as we know, there is no way to define the similarity between the combined Simon_Flanker conflict condition and the Simon/Flanker conditions in the MSIT. In contrast, with the spatial Stroop-Simon paradigm, we can define the similarity with the cosine of the angle difference across the two conditions in question.

      We have added the following texts in the discussion part to emphasize the 51 difference between our paradigm and other studies.

      "The use of an experimental paradigm that permits parametric manipulation of conflict similarity provides a way to systematically investigate the organization of cognitive control, as well as its influence on adaptive behaviors. This approach extends traditional paradigms, such as the multi-source interference task (Fu et al., 2022), color Stroop-Simon task (Liu et al., 2010) and similar paradigms that do not afford a quantifiable metric of conflict source similarity."

      References:

      Li, Q., Nan, W., Wang, K., & Liu, X. (2014). Independent processing of stimulus-stimulus and stimulus-response conflicts. PloS One, 9(2), e89249.

      2) The evidence from this previous work for mixtures between different conflict sources make the framing of 'infinite possible types of conflict' feel like a strawman. The authors cite classic work (e.g., Kornblum et al., 1990) that develops a typology for conflict which is far from infinite, and I think few people would argue that every possible source of difficulty will have to be learned separately. Such an issue is addressed in theories like 'Expected Value of Control', where optimization of control policies can address unique combinations of task demands.

      The notion that there might be infinite conflicts arises when we consider the quantitative feature of cognitive control. If each combination of the Stroop-Simon combination is regarded as a conflict condition, there would be infinite combinations, and it is our major goal to investigate how these infinite conflict conditions are represented effectively in a space with finite dimensions. We agree that it is unnecessary to dissociate each of these conflict conditions into a unique conflict type, since they may not differ substantially. However, we argue that understanding variant conflicts within a purely categorical framework (e.g., Simon and Flanker conflict in MSIT) is insufficient, especially because it leads to dichotomic conclusions that do not capture how combinations of conflicts are organized in the brain, as our study addresses.

      There could be different perspectives on how our cognitive control system flexibly encodes and resolves multiple conflicts. The cognitive space assumption we held provides a principle by which we can represent multiple conflicts in a lower dimensional space efficiently. While the “Expected Value of Control” theory addresses when and how much cognitive control to apply based on control demand, the “cognitive space” view seeks to explain how the conflict, which defines cognitive control demand, is encoded in the brain. Thus, we argue that these two lines of work are different yet complementary. The geometry of cognitive space of conflict can benefit the adjustment of cognitive control for upcoming conflicts. For example, our brain may evaluate the similarity/distance (and thus cost) between the consecutive conflict conditions, and selects the path with best cost-benefit tradeoff to switch from one state to another. This idea is conceptually similar to a recent study by Grahek et al. (2022) demonstrating that more frequently switching states were encoded as closer together than less frequently switching states in a “drift-threshold” space.

      Nevertheless, Grahek et al (2022) investigated how cognitive control changes based on the expected value of control theory within the same conflict, whereas our study aims to examine organization of different conflict.

      We have added the implications of cognitive space view in the discussion to indicate the potential values of our finding to understand the EVC account and the difference between the two theories.

      “Previous researchers have proposed an “expected value of control (EVC)” theory, which posits that the brain can evaluate the cost and benefit associated with executing control for a demanding task, such as the conflict task, and specify the optimal control strength (Shenhav et al., 2013). For instance, Grahek et al. (2022) found that more frequently switching goals when doing a Stroop task were achieved by adjusting smaller control intensity. Our work complements the EVC theory by further investigating the neural representation of different conflict conditions and how these representations can be evaluated to facilitate conflict resolution. We found that different conflict conditions can be efficiently represented in a cognitive space encoded by the right dlPFC, and participants with stronger cognitive space representation have also adjusted their conflict control to a greater extent based on the conflict similarity (Fig 4C). The finding suggests that the cognitive space organization of conflicts guides cognitive control to adjust behavior. Previous studies have shown that participants may adopt different strategies to represent a task, with the model-based strategies benefitting goal-related behaviors more than the model-free strategies (Rmus et al., 2022). Similarly, we propose that cognitive space could serve as a mental model to assist fast learning and efficient organization of cognitive control settings. Specifically, the cognitive space representation may provide a principle for how our brain evaluates the expected cost of switching and the benefit of generalization between states and selects the path with the best cost-benefit tradeoff (Abrahamse et al., 2016; Shenhav et al., 2013). The proximity between two states in cognitive space could reflect both the expected cognitive demand required to transition and the useful mechanisms to adapt from. The closer the two conditions are in cognitive space, the lower the expected switching cost and the higher the generalizability when transitioning between them. With the organization of a cognitive space, a new conflict can be quickly assigned a location in the cognitive space, which will facilitate the development of cognitive control settings for this conflict by interpolating nearby conflicts and/or projecting the location to axes representing different cognitive control processes, thus leading to a stronger CSE when following a more similar conflict condition. On the other hand, without a cognitive space, there would be no measure of similarity between conflicts on different trials, hence limiting the ability of fast learning of cognitive control setting from similar trials.”

      Reference:

      Grahek, I., Leng, X., Fahey, M. P., Yee, D., & Shenhav, A. Empirical and Computational Evidence for Reconfiguration Costs During Within-Task Adjustments in Cognitive Control. CogSci.

      3) Wouldn't a region that represented each conflict source separately still show the same pattern of results? The degree of Stroop vs Simon conflict is perfectly negatively correlated across conditions, so wouldn't a region that just tracks Stoop conflict show these RSA patterns? The authors show that overall congruency is not represented in DLPFC (which is surprising), but they don't break it down by whether this is due to Stroop or Simon congruency (I'm not sure their task allows for this).

      To estimate the unique contributions of the spatial Stroop and Simon conflicts, we performed a model-comparison analysis. We constructed a Stroop-Only model and a Simon-Only model, with each conflict type projected onto the Stroop (vertical) axis or Simon (horizontal) axis, respectively. The similarity between any two conflict types was defined using the Jaccard similarity index (Jaccard, P., 1901), that is, their intersection divided by their union. By replacing the cognitive spacebased conflict similarity regressor with the Stroop-Only and Simon-Only regressors, we calculated their BICs. Results showed that the BIC was larger for Stroop-Only (5377122) and Simon-Only (5377096) than for the Cognitive-Space model (5377094). An additional Stroop+Simon model, including both Stroop-Only and Simon-Only regressors, also showed a poorer model fitting (BIC = 5377118) than the Cognitive-Space model. Considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials), we also conducted the model comparison using the incongruent trials only. Results showed that Stroop-Only (1344128), Simon-Only (1344120), and Stroop+Simon (1344157) models all showed higher BIC values than the CognitiveSpace model (1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. Therefore, we believe the cognitive space has incorporated both dimensions. We added these additional analyses and results to the revised manuscript.

      “To examine if the right 8C specifically encodes the cognitive space rather than the domain-general or domain-specific organizations, we tested several additional models (see Methods). Model comparison showed a lower BIC in the Cognitive-Space model (BIC = 5377094) than the Domain-General (BIC = 537127) or Domain-Specific (BIC = 537127) models. Further analysis showed the dimensionality of the representation in the right 8C was 1.19, suggesting the cognitive space was close to 1D. We also tested if the observed conflict similarity effect was driven solely by spatial Stroop or Simon conflicts, and found larger BICs for the models only including the Stroop similarity (i.e., the Stroop-Only model, BIC = 5377122) or Simon similarity (i.e., the Simon-Only model, BIC = 5377096). An additional Stroop+Simon model, including both StroopOnly and Simon-Only regressors, also showed a worse model fitting (BIC = 5377118). Moreover, we replicated the results with only incongruent trials, considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials). We found a poorer fitting in Domain-general (BIC = 1344129), Domain-Specific (BIC = 1344129), Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. The more detailed model comparison results are listed in Table 2.”

      We reason that we did not observe an overall congruency effect in the RSA results is because our definition of congruency here differed from traditional definitions (i.e., contrast between incongruent and congruent conditions). In the congruency regressor of our RSA model, we defined representational similarity as 1 if calculated between two incongruent, or two congruent trials, and 0 if between incongruent and congruent trials. Thus, our definition of the congruency regressor reflects whether multivariate patterns differ between incongruent and congruent trials, rather than whether activity strengths differ. Indeed, we did observe the latter form of congruency effects, with stronger univariate activities in pre-SMA for incongruent versus congruent conditions. We have added this in the Note S6 (“The multivariate representations of conflict type and orientation are different from the congruency effect”):

      “Neither did we observe a multivariate congruency effect (i.e., the pattern difference between incongruent and congruent conditions compared to that within each condition) in the right 8C or any other regions. Note the definition of congruency here differed from traditional definitions (i.e., contrast between activity strength of incongruent and congruent conditions), with which we found stronger univariate activities in pre-SMA for incongruent versus congruent conditions.”

      We could not determine whether the null effect of the congruency regressor was due to Stroop or Simon congruency alone, because congruency levels of the two types always covary. On all trials of the compound conditions (Conf 2-4), whenever the Stroop dimension was incongruent, the Simon dimension was also incongruent, and vice versa for the congruent condition. Thus, the contribution of spatial Stroop or Simon alone to the congruency effect could not be tested using compound conditions. Although we have pure spatial Stroop or Simon conditions, within-Stroop and withinSimon trial pairs constituted only 8% of cells in the representational similarity matrix. This was insufficient to determine whether the null congruency effect was due to solely Stroop or Simon.

      Overall, with the added analysis we found that the data in the right 8C area supports conflict representations that are organized based on both Simon and spatial Stroop conflict. Although the current experimental design does not allow us to identify whether the null effect of the congruency regressor was driven by either conflict or both, we clarified that the congruency regressor did not test the 205 conventional congruency effect and the null finding does not contradict previous 206 research.

      Reference:

      Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat(37), 547-579.

      4) The authors use a novel form of RSA that concatenates patterns across conditions, runs and subjects into a giant RSA matrix, which is then used for linear mixed effects analysis. This appears to be necessary because conflict type and visual orientation are perfectly confounded within the subject (although, if I understand, the conflict type x congruence interaction wouldn't have the same concern about visual confounds, which shouldn't depend on congruence). This is an interesting approach but should be better justified, preferably with simulations validating the sensitivity and specificity of this method and comparing it to more standard methods.

      The confound exists for both the conflict type and the conflict type × congruence interaction in our design, since both incongruent and congruent conditions include stimuli from the full orientation space. For example, for the spatial Stroop type, the congruent condition could be either an up arrow at the top or a down arrow at the bottom. Similarly, the incongruent condition could be either an up arrow at the bottom or a down arrow at the top. Therefore, both the congruent and incongruent conditions are perfectly confounded with the orientation.

      We reanalyzed the data using the well-documented approach by Chen et al. (2017, Neuroimage), as suggested by the reviewer. The new analysis replicated our previously reported results (Fig. 4-5, S4-S7). As Chen et al (2017) has provided abundant simulations to validate this approach, we did not run any further simulations.

      5) A chief concern is that the same pattern contributes to many entries in the DV, which has been addressed in previous work using row-wise and column-wise random effects (Chen et al., 2017, Neuroimage). It would also be informative to know whether the results hold up to removing within-run similarity, which can bias similarity measures (Walther et al., 2016, Neuroimage).

      Thank you for the comment. In our revised manuscript, we followed your suggestion and adopted the approach proposed by Chen et al. (2017). Specifically, we included both the upper and lower triangle of the representational similarity matrix (excluding the diagonal). Moreover, we also removed all the within-subject similarity (thus also excluding the within-run similarity as suggested by Walther et al. (2016)) to minimize the bias of the potentially strong within-subject similarity. In addition, we added both the row-wise and column-wise random effects to capture the dependence of cells within each column and each row, respectively (Chen et al., 2017).

      Results from this approach largely replicated our previous results. The right 8C again showed significant conflict similarity representation, with greater representational strength in incongruent than congruent condition, and positively correlated to behavioral performance. The orientation effect was also identified in the visual (e.g., right V1) and oculomotor (e.g., left FEF) regions.

      We have revised the methodology and the results in the revised manuscript:

      "Representational similarity analysis (RSA).

      For each cortical region, we calculated the Pearson’s correlations between fMRI activity patterns for each run and each subject, yielding a 1400 (20 conditions × 2 runs × 35 participants) × 1400 RSM. The correlations were calculated in a cross297 voxel manner using the fMRI activation maps obtained from GLM3 described in the previous section. We excluded within-subject cells from the RSM (thus also excluding the within-run similarity as suggested by Walther et al., (2016)), and the remaining cells were converted into a vector, which was then z-transformed and submitted to a linear mixed effect model as the dependent variable. The linear mixed effect model also included regressors of conflict similarity and orientation similarity. Importantly, conflict similarity was based on how Simon and spatial Stroop conflict are combined and hence was calculated by first rotating all subject’s stimulus location to the top right and bottom-left quadrants, whereas orientation was calculated using original stimulus locations. As a result, the regressors representing conflict similarity and orientation similarity were de-correlated. Similarity between two conditions was measured as the cosine value of the angular difference. Other regressors included a target similarity regressor (i.e., whether the arrow directions were identical), a response similarity regressor (i.e., whether the correct responses were identical); a spatial Stroop distractor regressor (i.e., vertical distance between two stimulus locations); a Simon distractor regressor (i.e., horizontal distance between two stimulus locations). Additionally, we also included a regressor denoting the similarity of Group (i.e., whether two conditions are within the same subject group, according to the stimulus-response mapping). We also added two regressors including ROI316 mean fMRI activations for each condition of the pair to remove the possible uni-voxel influence on the RSM. A last term was the intercept. To control the artefact due to dependence of the correlation pairs sharing the same subject, we included crossed random effects (i.e., row-wise and column-wise random effects) for the intercept, conflict similarity, orientation and the group factors (G. Chen et al., 2017)."

      Reference:

      Walther, A., Nili, H., Ejaz, N., Alink, A., Kriegeskorte, N., & Diedrichsen, J. (2016). Reliability of dissimilarity measures for multi-voxel pattern analysis. Neuroimage, 137, 188-200. doi:10.1016/j.neuroimage.2015.12.012

      6) Another concern is the extent to which across-subject similarity will only capture consistent patterns across people, making this analysis very similar to a traditional univariate analysis (and unlike the traditional use of RSA to capture subject-specific patterns).

      With proper normalization, we assume voxels across different subjects should show some consistent localizations, although individual differences can be high. J. Chen et al. (2017) has demonstrated that consistent multi-voxel activation patterns exist across individuals. Previous studies have also successfully applied cross-subject RSA (see review by Freund et al, 2021) and cross-subject decoding approaches (e.g., Jiang et al., 2016; Tusche et al., 2016), so we believe cross-subject RSA should be feasible to capture distributed activation patterns shared at the group level. We added this argument in the revised manuscript:

      "Previous studies (e.g., J. Chen et al., 2017) have demonstrated that consistent multivoxel activation patterns exist across individuals, and successful applications of cross-subject RSA (see review by Freund, Etzel, et al., 2021) and cross-subject decoding approaches (Jiang et al., 2016; Tusche et al., 2016) have also been reported."

      In the revised manuscript, we also tested whether the representation in right 8C held for within-subject data. We reasoned that the conflict similarity effects identified by cross-subject RSA should be replicable in within-subject data, although the latter is not able to dissociate the conflict similarity effect from the orientation effect. We performed similar RSA for within-subject RSMs, excluding the within-run cells. We replaced the perfectly confounded factors of conflict similarity and orientation with a common factor called similarity_orientation. Other confounding factor pairs were addressed similarly. Results showed a significant effect of similarity_orientation, t(13993) = 3.270, p = .0005, 1-tailed. Given the specific representation of conflict similarity identified by the cross-subject RSA, we believe that the within-subject data of right 8C probably showed similar conflict similarity modulation effects as the cross-subject data, although future research that orthogonalizes conflict type and orientation is needed to fully answer this question. We added this result in the revised section Note S7.

      "Note S7. The cross-subject RSA captures similar effects with the within-subject RSA Considering the variability in voxel-level functional localizations among individuals, one may question whether the cross-subject RSA results were biased by the consistent multi-voxel patterns across subjects, distinct from the more commonly utilized withinsubject RSA. We reasoned that the cross-subject RSA should have captured similar effects as the within-subject RSA if we observe the conflict similarity effect in right 8C with the latter analysis. Therefore, we tested whether the representation in right 8C held for within-subject data. Specifically, we performed similar RSA for withinsubject RSMs, excluding the within-run cells. We replaced the perfectly confounded factors of conflict similarity and orientation with a common factor called similarity_orientation. Other confounding factor pairs (i.e., target versus response, and Stroop distractor versus Simon distractor) were addressed similarly. Results showed a significant effect of similarity_orientation, t(13993) = 3.270, p = .0005, 1tailed. Given the specific representation of conflict similarity identified by the crosssubject RSA, the within-subject data of right 8C may show similar conflict similarity modulation effects as the cross-subject data. Further research is needed to fully dissociate the representation of conflict and the representation of visual features such as orientation."

      Reference:

      Chen, J., Leong, Y. C., Honey, C. J., Yong, C. H., Norman, K. A., & Hasson, U. (2017). Shared memories reveal shared structure in neural activity across individuals. Nature Neuroscience, 20(1), 115-125.

      Freund, M. C., Etzel, J. A., & Braver, T. S. (2021). Neural Coding of Cognitive Control: The Representational Similarity Analysis Approach. Trends in Cognitive Sciences, 25(7), 622-638.

      Jiang, J., Summerfield, C., & Egner, T. (2016). Visual Prediction Error Spreads Across Object Features in Human Visual Cortex. J Neurosci, 36(50), 12746-12763.

      Tusche, A., Bockler, A., Kanske, P., Trautwein, F. M., & Singer, T. (2016). Decoding the Charitable Brain: Empathy, Perspective Taking, and Attention Shifts Differentially Predict Altruistic Giving. Journal of Neuroscience, 36(17), 4719-4732.

      7) Finally, the authors should confirm all their results are robust to less liberal methods of multiplicity correction. For univariate analysis, they should report the effects from the standard p < .001 cluster forming threshold for univariate analysis (or TFCE). For multivariate analyses, FDR can be quite liberal. The authors should consider whether their mixed-effects analyses allow for group-level randomization, and consider (relatively powerful) Max-Stat randomization tests (Nichols & Holmes, 2002, Hum Brain Mapp).

      In our revised manuscript, we have corrected the univariate results using the probabilistic TFCE (pTFCE) approach by Spisak et al. (2019). This approach estimates the conditional probability of cluster extent based on Bayes’ rule. Specifically, we applied pTFCE on our univariate results (i.e., the z-maps of our contrasts). This returned enhanced Z-score maps, which were then thresholded based on simulated cluster size thresholds using 3dClustSim. A cluster-forming threshold of p < .001 was employed. Results showed only the pre-SMA was activated in the incongruent > congruent contrast, and right IPS and right dmPFC were activated in the linear Simon modulation effect. Further tests also showed these regions were not correlated with the behavioral performance, uncorrected ps >.28. These results largely replicated our previous results. We have revised the method and results accordingly.

      Methods:

      "Results were corrected with the probabilistic threshold-free cluster enhancement(pTFCE) and then thresholded by 3dClustSim function in AFNI (Cox & Hyde, 1997) with voxel-wise p < .001 and cluster-wize p < .05, both 1-tailed."

      Results:

      "In the fMRI analysis, we first replicated the classic congruency effect by searching for brain regions showing higher univariate activation in incongruent than congruent conditions (GLM1, see Methods). Consistent with the literature (Botvinick et al., 2004; Fu et al., 2022), this effect was observed in the pre-supplementary motor area (preSMA) (Fig. 3, Table S1). We then tested the encoding of conflict type as a cognitive space by identifying brain regions with activation levels parametrically covarying with the coordinates (i.e., axial angle relative to the horizontal axis) in the hypothesized cognitive space. As shown in Fig. 1B, change in the angle corresponds to change in spatial Stroop and Simon conflicts in opposite directions. Accordingly, we found the right inferior parietal sulcus (IPS) and the right dorsomedial prefrontal cortex (dmPFC) displayed positive correlation between fMRI activation and the Simon conflict (Fig. 3, Fig. S3, Table S1)."

      We appreciate the reviewer’s suggestion to apply the Max-Stat randomization tests (Nichols & Holmes, 2002) for the multivariate analyses. However, the representational similarity matrix was too large (1400×1400) to be tested with a balanced randomization approach (i.e., the Max-Stat), due to (1) running even 1000 times for all ROIs cost very long time; (2) the distribution generated from normal times of randomization (e.g., 5000 iterations) would probably be unbalanced, since the full range of possible samples that could be generated by a complete randomization is not adequately represented. Instead, we adopted a very strict Bonferroni correction p < 0.0001/360 when reporting the regression results from RSA. Notebally, Chen et al (2017) has shown that their approach could control the FDR at an acceptable level.

      Reference:

      Spisák, T., Spisák, Z., Zunhammer, M., Bingel, U., Smith, S., Nichols, T., & Kincses,T. (2019). Probabilistic TFCE: A generalized combination of cluster size and voxel intensity to increase statistical power. NeuroImage, 185, 12-26.

      Chen, G., Taylor, P. A., Shin, Y.-W., Reynolds, R. C., & Cox, R. W. J. N. (2017). Untangling the relatedness among correlations, Part II: Inter-subject correlation group analysis through linear mixed-effects modeling. 147, 825-840.

      Minor concerns:

      8) I appreciate the authors wanting to present the conditions in a theory-agnostic way, but the framing of 5 conflict types was confusing. I think framing the conditions as a mixture of 2 conflict types (Stroop and Simon) makes more sense, especially given the previous work on MSIT.

      We have renamed the Type1-5 as spatial Stroop, StHSmL, StMSmM, StLSmH, and Simon conditions, respectively. H, L, and M indicate high, low andmedium similarity with the corresponding conflict, respectively. This is alsoconsistent with the naming of our previous work (Yang et al., 2021).

      Reference:

      Yang, G., Xu, H., Li, Z., Nan, W., Wu, H., Li, Q., & Liu, X. (2021). The congruency sequence effect is modulated by the similarity of conflicts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 47(10), 1705-1719.

      9) It would be helpful to have more scaffolding for the key conflict & orientation analyses. A schematic in the main text that outlines these contrasts would be very helpful (e.g. similar to S4).

      We have inserted Figure 7 in the revised manuscript. In this figure, we plotted the schematic of the difference between the conflict similarity 467 and orientation regressors according to their cross-group representational similarity 468 matrices.

      10) Figure 4D could be clearer, both in labeling and figure caption. 'Modeled similarity' could be relabelled to something more informative, like 'conflict type (or mixture) similarity'. Alternatively, it would be helpful to show a summary RDM for region r-8C. For example, breaking it down by just conflict type and congruence.

      We have relabeled the x-axis to “Conflict type similarity” and y-axis to “Neural similarity” for Figure 4D in the revised manuscript.

      We have also added a summary RSM figure in Fig. S5 to show the different similarity patterns between incongruent and congruent conditions.

      11) It may be helpful to connect your work to how people have discussed multiple forms of conflict monitoring and control with respect to target and distractor features e.g., Lindsay & Jacoby, 1994, JEP:HPP; Mante, Sussillo et al., 2013, Nature; Soutschek et al., 2015, JoCN; Jackson et al., 2021, Comm Bio; Ritz & Shenhav, 2022, bioRxiv

      We have added an analysis to examine how cognitive control modulates target and distractor representation. To this end, we selected the left V4, a visual region showing joint representation of target, Stroop distractor and Simon distractor, as the region of interest. We tested whether these representation strengths differed between incongruent and congruent conditions, finding the representation of target was stronger and representations of both distractors were weaker in the incongruent condition. This suggests that cognitive control modulates the stimuli in both directions. We added the results in Note S10 and Fig. S8, and also added discussion of it in “Methodological implications”.

      “Note S10. Cognitive control enhances target representation and suppresses distractor representation Using the separability of confounding factors afforded by the cross-subject RSA, we examined how representations of targets and distractors are modulated by cognitive control. The key assumption is that exerting cognitive control may enhance target representation and suppress distractor representation. We hypothesized that stimuli are represented in visual areas, so we chose a visual ROI from the main RSA results showing joint representation of target, spatial Stroop distractor and Simon distractor (p < .005, 1-tail, uncorrected). Only the left V4 met this criterion. We then tested representations with models similar to the main text for incongruent only trials, congruent only trials, and the incongruent – congruent contrast. The contrast model additionally used interaction between the congruency and target, Stroop distractor and Simon distractor terms. Results showed that in the incongruent condition, when we employ more cognitive control, the target representation was enhanced (t(237990) = 2.59, p = .029, Bonferroni corrected) and both spatial Stroop (t(237990) = –4.18, p < .001, Bonferroni corrected) and Simon (t(237990) = –3.14, p = .005, Bonferroni corrected) distractor representations were suppressed (Fig. S8). These are consistent with the idea that the top-down control modulates the stimuli in both directions (Polk et al., 2008; Ritz & Shenhav, 2022).”

      Discussion:

      “Moreover, the cross-subject RSA provides high sensitivity to the variables of interest and the ability to separate confounding factors. For instance, in addition to dissociating conflict type from orientation, we dissociated target from response, and spatial Stroop distractor from Simon distractor. We further showed cognitive control can both enhance the target representation and suppress the distractor representation (Note S10, Fig. S8), which is in line with previous studies (Polk et al., 2008; Ritz & Shenhav, 2022)."

      12) For future work, I would recommend placing stimuli along the whole circumference, to orthogonalize Stroop and Simon conflict within-subject.

      We thank the reviewer for this highly helpful suggestion. Expanding the 547 conflict conditions to a full conflict space and replicating our current results could 548 provide stronger evidence for the cognitive space view.

      In the revised manuscript, we added this as a possible future design:

      “A possible improvement to our current design would be to include left, right, up, and down arrows presented in a grid formation across four spatially separate quadrants, with each arrow mapped to its own response button. However, one potential confounding factor would be that these conditions have different levels of difficulty (i.e., different magnitude of conflict), which may affect the CSE results and their representational similarity."

      Reviewer #2:

      Summary, general appraisal

      This study examines the construct of "cognitive spaces" as they relate to neural coding schemes present in response conflict tasks. The authors utilize a novel paradigm, in which subjects must map the direction of a vertically oriented arrow to either a left or right response. Different types of conflict (spatial Stroop, Simon) are parametrically manipulated by varying the spatial location of the arrow (a taskirrelevant feature). The vertical eccentricity of the arrow either agrees or conflicts with the arrow's direction (spatial Stroop), while the horizontal eccentricity of the arrow agrees or conflicts with the side of the response (Simon). A neural coding model is postulated in which the stimuli are embedded in a cognitive space, organized by distances that depend only on the similarity of congruency types (i.e., where conditions with similar relative proportions of spatial-Stroop versus Simon congruency are represented with similar activity patterns). The authors conduct a behavioral and fMRI study to provide evidence for such a representational coding scheme. The behavioral findings replicate the authors' prior work in demonstrating that conflict-related cognitive control adjustments (the congruency sequence effect) shows strong modulation as a function of the similarity between conflict types. With the fMRI neural activity data, the authors report univariate analyses that identified activation in left prefrontal and dorsomedial frontal cortex modulated by the amount of Stroop or Simon conflict present, and multivariate representational similarity analyses (RSA) that identified right lateral prefrontal activity encoding conflict similarity and correlated with the behavioral effects of conflict similarity.

      This study tackles an important question regarding how distinct types of conflict, which have been previously shown to elicit independent forms of cognitive control adjustments, might be encoded in the brain within a computationally efficient representational format. The ideas postulated by the authors are interesting ones and the utilized methods are rigorous.

      We would like to express our sincere appreciation for the reviewer’s positive evaluation of our manuscript and the constructive comments and suggestions. Through careful consideration of your feedback, we have endeavored to make our manuscript more accessible to readers and further strengthened our findings. In response to your suggestion, we reanalyzed our data with the approach proposed by Chen et al.’s (2017, NeuroImage). This reanalysis largely replicated our previous results, reinforcing the validity of our findings. Additionally, we conducted tests with several alternative models and found that the cognitive space hypothesis best aligns with our observed data. We have incorporated these revisions and additional analyses into the manuscript based on your valuable feedback. As a result, we believe that these changes and additional analyses have significantly enhanced the quality of our manuscript. We have provided detailed responses to your comments below.

      However, the study has critical limitations that are due to a lack of clarity regarding theoretical hypotheses, serious confounds in the experimental design, and a highly non-standard (and problematic) approach to RSA. Without addressing these issues it is hard to evaluate the contribution of the authors findings to the computational cognitive neuroscience literature.

      1) The primary theoretical question and its implications are unclear. The paper would greatly benefit from more clearly specifying potential alternative hypotheses and discussing their implications. Consider, for example, the case of parallel conflict monitors. Say that these conflict monitors are separately tuned for Stroop and Simon conflict, and are located within adjacent patches of cortex that are both contained within a single cortical parcel (e.g., as defined by the Glasser atlas used by the authors for analyses). If RSA was conducted on the responses of such a parcel to this task, it seems highly likely that an activation similarity matrix would be observed that is quite similar (if not identical) to the hypothesized one displayed in Figure 1. Yet it would seem like the authors are arguing that the "cognitive space" representation is qualitatively and conceptually distinct from the "parallel monitor" coding scheme. Thus, it seems that the task and analytic approach is not sufficient to disambiguate these different types of coding schemes or neural architectures.

      The authors also discuss a fully domain-general conflict monitor, in which different forms of conflict are encoded within a single dimension. Yet this alternative hypothesis is also not explicitly tested nor discussed in detail. It seems that the experiment was designed to orthogonalize the "domain-general" model from the "cognitive space" model, by attempting to keep the overall conflict uniform across the different stimuli (i.e., in the design, the level of Stroop congruency parametrically trades off with the level of Simon congruency). But in the behavioral results (Fig. S1), the interference effects were found to peak when both Stroop and Simon congruency are present (i.e., Conf 3 and 4), suggesting that the "domain-general" model may not be orthogonal to the "cognitive space" model. One of the key advantages of RSA is that it provides the ability to explicitly formulate, test and compare different coding models to determine which best accounts for the pattern of data. Thus, it would seem critical for the authors to set up the design and analyses so that an explicit model comparison analysis could be conducted, contrasting the domain-general, domain-specific, and cognitive space accounts.

      We appreciate the reviewer pointing out the need to formally test alternative models. In the revised manuscript, we have added and compared a few alternative models, finding the Cognitive-Space model (the one with graded conflict similarity levels as we reported) provided the best fit to our data. Specifically, we tested the following five models against the Cognitive-Space model:

      (1) Domain-General model. This model treats each conflict type as equivalent, so each two conflict types only differ in the magnitude of their conflict. Therefore, we defined the domain-general matrix as the difference in their effects indexed by the group-averaged RT in Experiment 2. Then the z-scored model vector was sign-flipped to reflect similarity instead of distance. This model showed non-significant conflict type effects (t(951989) = 0.92, p = .179) and poorer fit (BIC = 5377126) than the Cognitive-Space model (BIC = 5377094).

      (2) Domain-Specific model. This model treats each conflict type differently, so we used a diagonal matrix, with within-conflict type similarities being 1 and all crossconflict type similarities being 0. This model also showed non-significant effects (t(951989) = 0.84, p = .201) and poorer fit (BIC = 5377127) than the Cognitive-Space model.

      (3) Stroop-Only model. This model assumes that the right 8C only encodes the spatial Stroop conflict. We projected each conflict type to the Stroop (vertical) axis and calculated the similarity between any two conflict types as the Jaccard similarity index (Jaccard, 1901), that is, their intersection divided by their union. This model also showed non-significant effects (t(951989) = 0.20, p = .423) and poorer fit (BIC = 5377122) than the Cognitive-Space model.

      (4) Simon-Only model. This model assumes that the right 8C only encodes the Simon conflict. We projected each conflict type to the Simon (horizontal) axis and calculated the similarity like the Stroop-Only model. This model showed significant effects (t(951989) = 4.19, p < .001) but still quantitatively poorer fit (BIC = 5377096) than the Cognitive-Space model.

      (5) Stroop+Simon model. This model assumes the spatial Stroop and Simon conflicts are parallelly encoded in the brain, similar to the "parallel monitor" hypothesis suggested by the reviewer. It includes both Stroop-Only and Simon-Only regressors. This model showed nonsignificant effect for the Stroop regressor (t(951988) = 0.06, p = .478) and significant effect for the Simon regressor (t(951988) = 3.30, p < .001), but poorer fit (BIC = 5377118) than the Cognitive-Space model.

      “Moreover, we replicated these results with only incongruent trials (i.e., when conflict is present), considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials). We found a poorer fitting in Domain-general (BIC = 1344129), Domain-Specific (BIC = 1344129), Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104).”

      In summary, these results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. We added the above results to the revised manuscript.

      The above analysis approach was added to the method “Model comparison and representational dimensionality”, and the results were added to the “Multivariate patterns of the right dlPFC encodes the conflict similarity” in the revised manuscript.

      Methods:

      “Model comparison and representational dimensionality To estimate if the right 8C specifically encodes the cognitive space, rather than the domain-general or domain-specific structures, we conducted two more RSAs. We replaced the cognitive space-based conflict similarity matrix in the RSA we reported above (hereafter referred to as the Cognitive-Space model) with one of the alternative model matrices, with all other regressors equal. The domain-general model treats each conflict type as equivalent, so each two conflict types only differ in the magnitude of their conflict. Therefore, we defined the domain-general matrix as the difference in their congruency effects indexed by the group-averaged RT in Experiment 2. Then the zscored model vector was sign-flipped to reflect similarity instead of distance. The domain-specific model treats each conflict type differently, so we used a diagonal matrix, with within-conflict type similarities being 1 and all cross-conflict type similarities being 0.

      Moreover, to examine if the cognitive space is driven solely by the Stroop or Simon conflicts, we tested a spatial Stroop-Only (hereafter referred to as “Stroop-Only”) and a Simon-Only model, with each conflict type projected onto the spatial Stroop (vertical) axis or Simon (horizontal) axis, respectively. The similarity between any two conflict types was defined using the Jaccard similarity index (Jaccard, 1901), that is, their intersection divided by their union. We also included a model assuming the Stroop and Simon dimensions are independently represented in the brain, adding up the StroopOnly and Simon-Only regressors (hereafter referred to as the Stroop+Simon model). We conducted similar RSAs as reported above, replacing the original conflict similarity regressor with the Strrop-Only, Simon-Only, or both regressors (for the Stroop+Simon model), and then calculated their Bayesian information criterions (BICs).”

      Results:

      “To examine if the right 8C specifically encodes the cognitive space rather than the domain-general or domain-specific organizations, we tested several additional models (see Methods). Model comparison showed a lower BIC in the Cognitive-Space model (BIC = 5377094) than the Domain-General (BIC = 537127) or Domain-Specific (BIC = 537127) models. Further analysis showed the dimensionality of the representation in the right 8C was 1.19, suggesting the cognitive space was close to 1D. We also tested if the observed conflict similarity effect was driven solely by spatial Stroop or Simon conflicts, and found larger BICs for the models only including the Stroop similarity (i.e., the Stroop-Only model, BIC = 5377122) or Simon similarity (i.e., the Simon-Only model, BIC = 5377096). An additional Stroop+Simon model, including both StroopOnly and Simon-Only regressors, also showed a worse model fitting (BIC = 5377118). Moreover, we replicated the results with only incongruent trials, considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials). We found a poorer fitting in Domain-general (BIC = 1344129), Domain-Specific (BIC = 1344129), Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. The more detailed model comparison results are listed in Table 2.”

      Reference:

      Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat(37), 547-579.

      2a) Relatedly, the reasoning for the use of the term "cognitive space" is unclear. The mere presence of graded coding for two types of conflict seems to be a low bar for referring to neural activity patterns as encoding a "cognitive space". It is discussed that cognitive spaces/maps allow for flexibility through inference and generalization. But no links were made between these cognitive abilities and the observed representational structure.

      In the revised manuscript, we have clarified that we tested a specific prediction of the cognitive space hypothesis: the geometry of the cognitive space predicts that more similar conflict types will have more similar neural representations,leading to the CSE and RSA patterns tested in this study. These results add to the literature by providing empirical evidence on how different conflict types are encoded in the brain. We agree that this study is not a comprehensive test of the cognitive space hypothesis. Thus, in the revised manuscript we explicitly clarified that this study is a test of the geometry of the cognitive space hypothesis.

      Critically, the cognitive space view holds that the representations of different abstract information are organized continuously and the representational geometry in the cognitive space are determined by the similarity among the represented information (Bellmund et al., 2018).

      "The present study aimed to test the geometry of cognitive space in conflict representation. Specifically, we hypothesize that different types of conflict are represented as points in a cognitive space. Importantly, the distance between the points, which reflects the geometry of the cognitive space, scales with the difference in the sources of the conflicts being represented by the points."

      We have also discussed the limitation of the results and stressed the need for more research to fully test the cognitive space hypothesis.

      “Additionally, our study is not a comprehensive test of the cognitive space hypothesis but aimed primarily to provide original evidence for the geometry of cognitive space in representing conflict information in cognitive control. Future research should examine other aspects of the cognitive space such as its dimensionality, its applicability to other conflict tasks such as Eriksen Flanker task, and its relevance to other cognitive abilities, such as cognitive flexibility and learning.

      2b) Additionally, no explicit tests of generality (e.g., via cross-condition generalization) were provided.

      To examine the generality of cognitive space across conditions, we conducted a leave-one-out prediction analysis. We used the behavioral data from Experiment 1 for this test, due to its larger amount of data than Experiment 2. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model as reported in the main text (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level for each subject. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001. We have added this analysis and result to the “Conflict type 706 similarity modulated behavioral congruency sequence effect (CSE)” section.

      “Moreover, to test the continuity and generalizability of the similarity modulation, we conducted a leave-one-out prediction analysis. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level for each subject. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001."

      2c) Finally, although the design elicits strong CSE effects, it seems somewhat awkward to consider CSE behavioral patterns as a reflection of the kind of abilities supported by a cognitive map (if this is indeed the implication that was intended). In fact, CSE effects are well-modeled by simpler "model-free" associative learning processes, that do not require elaborate representations of abstract structures.

      We argue the conflict similarity modulation of CSEs we observed cannot be explained by the “model-free” stimulus-driven associative learning process. This mainly refers to the feature integration account proposed by Hommel et al. (2004), which explains poorer performance in CI and IC trials (compared with CC and II trials) with the partial repetition cost caused by the breaking of stimulus-response binding. Although we cannot remove its influence on the within-type trials (similarity level 5, θ = 0), it should not affect the cross-type trials (similarity level 1-4, θ = 90°, 67.5°, 45° and 22.5°, respectively), because the CC, CI, IC, II trials had equal probabilities of partially repeated and fully switched trials (see the Author response image 1 for an example of trials across Conf 1 and Conf 3 conditions). Thus, feature integration cannot explain the gradual CSE decrease from similarity level 1 to 4, which sufficiently reproduce the full effect, as suggested by the leave-one-out prediction analysis mentioned above. We thus conclude that the similarity modulation of CSE cannot be explained by the stimulus-driven associative learning.

      Author response image 1.

      Notably, however, our findings are aligned with an associative learning account of cognitive control (Abrahamse et al., 2016), which extends association learning from stimulus/response level to cognitive control. In other words, abstract cognitive control state can be learned and generalized like other sensorimotor features. This view explicitly proposes that “transfer occurs to the extent that two tasks overlap”, a hypothesis directly supported by our CSE results (see also Yang et al., 2021). Extending this, our fMRI results provide the neural basis of how cognitive control can generalize through a representation of cognitive space. The cognitive space view complements associative learning account by providing a fundamental principle for the learning and generalization of control states. Given the widespread application of CSE as indicator of cognitive control generalization (Braem et al., 2014), we believe that it can be recognized as a kind of ability supported by the cognitive space. This was further supported by the brain-behavioral correlation: stronger encoding of cognitive space was associated with greater bias of trial-wise behavioral adjustment by the consecutive conflict similarity.

      We have incorporated these ideas into the discussion:

      “Similarly, we propose that cognitive space could serve as a mental model to assist fast learning and efficient organization of cognitive control settings. Specifically, the cognitive space representation may provide a principle for how our brain evaluates the expected cost of switching and the benefit of generalization between states and selects the path with the best cost-benefit tradeoff (Abrahamse et al., 2016; Shenhav et al., 2013). The proximity between two states in cognitive space could reflect both the expected cognitive demand required to transition and the useful mechanisms to adapt from. The closer the two conditions are in cognitive space, the lower the expected switching cost and the higher the generalizability when transitioning between them. With the organization of a cognitive space, a new conflict can be quickly assigned a location in the cognitive space, which will facilitate the development of cognitive control settings for this conflict by interpolating nearby conflicts and/or projecting the location to axes representing different cognitive control processes, thus leading to a stronger CSE when following a more similar conflict condition.”

      References:

      Hommel, B., Proctor, R. W., & Vu, K. P. (2004). A feature-integration account of sequential effects in the Simon task. Psychological Research, 68(1), 1-17. Abrahamse, E., Braem, S., Notebaert, W., & Verguts, T. (2016). Grounding cognitive control in associative learning. Psychological Bulletin, 142(7), 693-728.

      Yang, G., Xu, H., Li, Z., Nan, W., Wu, H., Li, Q., & Liu, X. (2021). The congruency sequence effect is modulated by the similarity of conflicts. Journal of 770 Experimental Psychology: Learning, Memory, and Cognition, 47(10), 1705-1719.

      Braem, S., Abrahamse, E. L., Duthoo, W., & Notebaert, W. (2014). What determines the specificity of conflict adaptation? A review, critical analysis, and proposed synthesis. Frontiers in Psychology, 5, 1134.

      3) More generally, it seems problematic that Stroop and Simon conflict in the paradigm parametrically trade-off against each other. A more powerful design would have de-confounded Stroop and Simon conflict so that each could be separately estimation via (potentially orthogonal) conflict axes. Additionally, incorporating more varied stimulus sets, locations, or responses might have enabled various tests of generality, as implied by a cognitive space account.

      We thank the reviewer for these valuable suggestions. We argue that the current design is adequate to test the prediction that more similar conflict types have more similar neural representations. That said, we agree that further examination using more powerful experimental designs are needed to fully test the cognitive space account of cognitive control. We also agree that employing more varied stimulus sets,locations and responses would further extend our findings. We have included this as a future research direction in the revised manuscript.

      We have revised our discussion about the limitation as:

      “A few limitations of this study need to be noted. To parametrically manipulate the conflict similarity levels, we adopted the spatial Stroop-Simon paradigm that enables parametrical combinations of spatial Stroop and Simon conflicts. However, since this paradigm is a two-alternative forced choice design, the behavioral CSE is not a pure measure of adjusted control but could be partly confounded by bottom-up factors such as feature integration (Hommel et al., 2004). Future studies may replicate our findings with a multiple-choice design (including more varied stimulus sets, locations and responses) with confound-free trial sequences (Braem et al., 2019). Another limitation is that in our design, the spatial Stroop and Simon effects are highly anticorrelated. This constraint may make the five conflict types represented in a unidimensional space (e.g., a circle) embedded in a 2D space. Future studies may test the 2D cognitive space with fully independent conditions. A possible improvement to our current design would be to include left, right, up, and down arrows presented in a grid formation across four spatially separate quadrants, with each arrow mapped to its own response button. However, one potential confounding factor would be that these conditions have different levels of difficulty (i.e., different magnitude of conflict), which may affect the CSE results and their representational similarity.”

      4) Serious confounds in the design render the results difficult to interpret. As much prior neuroimaging and behavioral work has established, "conflict" per se is perniciously correlated with many conceptually different variables. Consequently, it is very difficult to distinguish these confounding variables within aggregate measures of neural activity like fMRI. For example, conflict is confounded with increased time-on-task with longer RT, as well as conflict-driven increases in coding of other task variables (e.g., task-set related coding; e.g., Ebitz et al. 2020 bioRxiv). Even when using much higher resolution invasive measures than fMRI (i.e., eCoG), researchers have rightly been wary of making strong conclusions about explicit encoding of conflict (Tang et al, 2019; eLife). As such, the researchers would do well to be quite cautious and conservative in their analytic approach and interpretation of results.

      We acknowledge the findings showing that encoding of conflicts may not be easily detected in the brain. However, recent studies have shown that the representational similarity analysis can effectively detect representations of conflict tasks (e.g., the color Stroop) using factorial designs (Freund et al., 2021a; 2021b).

      In our analysis, we are aware of the potential impact of time-on-task (e.g., RT) on univariate activation levels and subsequent RSA patterns. To address this issue, we added univariate fMRI activation levels as nuisance regressors to the RSA. To de confound conflict from other factors such as orientation of stimuli related to the center of the screen, we also applied the cross-subject RSA approach. Furthermore, we were cautious about determining regions that encoded conflict control. We set three strict criteria: (1) Regions must show a conflict similarity modulation effect; (2) regions must show higher representational strength in the incongruent condition compared with the congruent condition; and (3) regions must correlate with behavioral performance. With these criteria, we believe that the results we reported are already conservative. We would be happy to implement any additional criteria the reviewer recommends.

      Reference:

      Freund, M. C., Etzel, J. A., & Braver, T. S. (2021a). Neural Coding of Cognitive Control: The Representational Similarity Analysis Approach. Trends in Cognitive Sciences, 25(7), 622-638.

      Freund, M. C., Bugg, J. M., & Braver, T. S. (2021b). A Representational Similarity 823 Analysis of Cognitive Control during Color-Word Stroop. Journal of 824 Neuroscience, 41(35), 7388-7402.

      5) This issue is most critical in the interpretation of the fMRI results as reflecting encoding of conflict types. A key limitation of the design, that is acknowledged by the authors is that conflict is fully confounded within-subject by spatial orientation. Indeed, the limited set of stimulus-response mappings also cast doubt on the underlying factors that give rise to the CSE modulations observed by the authors in their behavioral results. The CSE modulations are so strong - going from a complete absence of current x previous trial-type interaction in the cos(90) case all the way to a complete elimination of any current trial conflict when the prior trial was incongruent in the cos(0) case - that they cause suspicion that they are actually driven by conflict-related control adjustments rather than sequential dependencies in the stimulus-response mappings that can be associatively learned.

      Unlike the fMRI data, we cannot tease apart the effects of conflict similarity and orientation in a similar manner as the cross-subject RSA for behavioral CSEs. However, we have a few reasons that the orientation and other bottom-up factors should not be the factors driving the similarity modulation effect.

      First, we did not find any correlation between the regions showing orientation effects and behavioral CSEs. This suggests that orientation does not directly contribute to the CSE modulation.

      Second, if the CSE modulation is purely driven by the association learning of the stimulus-response mapping, we should observe a stronger modulation effect after more extensive training. However, our results do not support this prediction. Using data from Experiment 1, we found that the modulation effect remained constant across the three sessions (see Note S3).

      “Note S3. Modulation of conflict similarity on behavioral CSEs does not change across time We tested if the conflict similarity modulation on the CSE is susceptible to training. We collected the data of Experiment 1 across three sessions, thus it is possible to examine if the conflict similarity modulation effect changes across time. To this end, we added conflict similarity, session and their interaction into a mixed-effect linear model, in which the session was set as a categorical variable. With a post-hoc analysis of variance (ANOVA), we calculated the statistical significance of the interaction term. This approach was applied to both the RT and ER. Results showed no interaction effect in either RT, F(2,1479) = 1.025, p = .359, or ER, F(2,1479) = 0.789, p = .455. This result suggests that the modulation effect does not change across time. “

      Third, the observed similarity modulation on the CSE, particularly for similarity levels 1-4, should not be attributed to the stimulus-response associations, such as feature integration, as have been addressed in response to comment 2.c.

      Finally, other bottom-up factors, such as the spatial location proximity did not drive the CSE modulation results, which we have addressed in the original manuscript in Note S2.

      "Note S2. Modulation of conflict similarity on behavioral CSEs cannot be explained by the physical proximity

      In our design, the conflict similarity might be confounded by the physical proximity between stimulus (i.e., the arrow) of two consecutive trials. That is, when arrows of the two trials appear at the same quadrant, a higher conflict similarity also indicates a higher physical proximity (Fig. 1A). Although the opposite is true if arrows of the two trials appear at different quadrants, it is possible the behavioral effects can be biased by the within quadrant trials. To examine if the physical distance has confounded the conflict similarity modulation effect, we conducted an additional analysis.

      We defined the physical angular difference across two trials as the difference of their polar angles relative to the origin. Therefore, the physical angular difference could vary from 0 to 180°. For each CSE conditions (i.e., CC, CI, IC and II), we grouped the trials based on their physical angular distances, and then averaged trials with the same previous by current conflict type transition but different orders (e.g., StHSmL−StLSmH and StLSmH−StHSmL) within each subject. The data were submitted to a mixed-effect model with the conflict similarity, physical proximity (i.e., the opposite of the physical angular difference) as fixed-effect predictors, and subject and CSE condition as random effects. Results showed significant conflict similarity modulation effects in both Experiment 1 (RT: β = 0.09 ± 0.01, t(7812) = 13.74, p < .001, ηp2 = .025; 875 ER: β = 0.09 ± 0.01, t(7812) = 7.66, p < .001, ηp2 = .018) and Experiment 2 (RT: β = 876 0.21 ± 0.02, t(3956) = 9.88, p < .001, ηp2 = .043; ER: β = 0.20 ± 0.03, t(4201) = 6.11, 877 p < .001, ηp2 = .038). Thus, the observed modulation of conflict similarity on behavioral 878 CSEs cannot be explained by physical proximity."

      6) To their credit, the authors recognize this confound, and attempt to address it analytically through the use of a between-subject RSA approach. Yet the solution is itself problematic, because it doesn't actually deconfound conflict from orientation. In particular, the RSA model assumes that whatever components of neural activity encode orientation produce this encoding within the same voxellevel patterns of activity in each subject. If they are not (which is of course likely), then orthogonalization of these variables will be incomplete. Similar issues underlie the interpretation target/response and distractor coding. Given these issues, perhaps zooming out to a larger spatial scale for the between-subject RSA might be warranted. Perhaps whole-brain at the voxel level with a high degree of smoothing, or even whole-brain at the parcel level (averaging per parcel). For this purpose, Schaefer atlas parcels might be more useful than Glasser, as they more strongly reflect functional divisions (e.g., motor strip is split into mouth/hand divisions; visual cortex is split into central/peripheral visual field divisions). Similarly, given the lateralization of stimuli, if a within-parcel RSA is going to be used, it seems quite sensible to pool voxels across hemispheres (so effectively using 180 parcels instead of 360).

      Doing RSA at the whole-brain level is an interesting idea. However, it does not allow the identification of specific brain regions representing the cognitive space. Additionally, increasing the spatial scale would include more voxels that are not involved in representing the information of interest and may increase the noise level of data. Given these concerns, we did not conduct the whole-brain level RSA.

      We agree that smoothing data can decrease cross-subject variance in voxel distribution and may increase the signal-noise ratio. We reanalyzed the results for the right 8C region using RSA on smoothed beta maps (6-mm FWHM Gaussian kernel). This yielded a significant conflict similarity effect, t(951989) = 5.55, p < .0001, replicating the results on unsmoothed data (t(951989) = 5.60, p < .0001). Therefore, we retained the results from unsmoothed data in the main text, and added the results based on smoothed data to the supplementary material (Note S9).

      “Note S9. The cross-subject pattern similarity is robust against individual differences Due to individual differences, the multivoxel patterns extracted from the same brain mask may not reflect exactly the same brain region for each subject. To reduce the influence of individual difference, we conducted the same cross-subject RSA using data smoothed with a 6-mm FWHM Gaussian kernel. Results showed a significant conflict similarity effect, t(951989) = 5.55, p < .0001, replicating the results on unsmoothed data (t(951989) = 5.60, p < .0001). “

      We also used the bilateral 8C area as a single mask and conducted the same RSA. We found a significant conflict type similarity effect, t(951989) = 4.36, p < .0001. However, the left 8C alone showed no such representation, t(951989) = 0.38, p = .351, consistent with the right lateralized representation of cognitive space we reported in Note S8. Therefore, we used ROIs from each hemisphere separately.

      “Note S8. The lateralization of conflict type representation

      We observed the right 8C but not the left 8C represented the conflict type similarity. A further test is to show if there is a lateralization. We tested several regions of the left dlPFC, including the i6-8, 8Av, 8C, p9-46v, 46, 9-46d, a9-46v (Freund, Bugg, et al., 2021). We found that none of these regions show the representation of conflict type, all uncorrected ps > .35. These results indicate that the conflict type is specifically represented in the right dlPFC. “

      We have also discussed the lateralization in the manuscript:

      “In addition, we found no such representation in the left dlPFC (Note S8), indicating a possible lateralization. Previous studies showed that the left dlPFC was related to the expectancy-related attentional set up-regulation, while the right dlPFC was related to the online adjustment of control (Friehs et al., 2020; Vanderhasselt et al., 2009), which is consistent with our findings. Moreover, the right PFC also represents a composition of single rules (Reverberi et al., 2012), which may explain how the spatial Stroop and Simon types can be jointly encoded in a single space.”

      7) The strength of the results is difficult to interpret due to the non-standard analysis method. The use of a mixed-level modeling approach to summarize the empirical similarity matrix is an interesting idea, but nevertheless is highly non-standard within RSA neuroimaging methods. More importantly, the way in which it was implemented makes it potentially vulnerable to a high degree of inaccuracy or bias. In this case, this bias is likely to be overly optimistic (high false positive rate). No numerical or formal defense was provided for this mixed-level model approach. As a result, the use of this method seems quite problematic, as it renders the strength of the observed results difficult to interpret. Instead, the authors are encouraged using a previously published method of conducting inference with between-subject RSA, such as the bootstrapping methods illustrated in Kragel et al. (2018; Nat Neurosci), or in potentially adopting one of the Chen et al. methods mentioned above, that have been extensively explored in terms of statistical properties.

      No numerical or formal defense was provided for this mixed-level model approach. As a result, the use of this method seems quite problematic, as it renders the strength of the observed results difficult to interpret. Instead, the authors are encouraged using a previously published method of conducting inference with between-subject RSA, such as the bootstrapping methods illustrated in Kragel et al. (2018; Nat Neurosci), or in potentially adopting one of the Chen et al. methods mentioned above, that have been extensively explored in terms of statistical properties.

      In our revised manuscript, we have adopted the approach proposed by Chen et al. (2017). Specifically, we included both the upper and lower triangle of the representational similarity matrix (excluding the diagonal). Moreover, we also removed all the within-subject similarity (thus also excluding the within-run similarity) to minimize the bias of the potentially strong within-subject similarity (note we also analyzed the within-subject data and found significant effects for the similarity modulation, though this effect cannot be attributed to the conflict similarity or orientation alone. We added this part in Note S7, see below). In addition, we added both the row-wise and column-wise random effects to capture the dependence of cells within each column/row (Chen et al., 2017). We have revised the method part as:

      “We excluded within-subject cells from the RSM (thus also excluding the withinrun similarity as suggested by Walther et al., (2016)), and the remaining cells were converted into a vector, which was then z-transformed and submitted to a linear mixed effect model as the dependent variable. The linear mixed effect model also included regressors of conflict similarity and orientation similarity. Importantly, conflict similarity was based on how Simon and spatial Stroop conflicts are combined and hence was calculated by first rotating all subject’s stimulus location to the topright and bottom-left quadrants, whereas orientation was calculated using original stimulus locations. As a result, the regressors representing conflict similarity and orientation similarity were de-correlated. Similarity between two conditions was measured as the cosine value of the angular difference. Other regressors included a target similarity regressor (i.e., whether the arrow directions were identical), a response similarity regressor (i.e., whether the correct responses were identical); a spatial Stroop distractor regressor (i.e., vertical distance between two stimulus locations); a Simon distractor regressor (i.e., horizontal distance between two stimulus locations). Additionally, we also included a regressor denoting the similarity of Group (i.e., whether two conditions are within the same subject group, according to the stimulus-response mapping). We also added two regressors including ROImean fMRI activations for each condition of the pair to remove the possible uni-voxel influence on the RSM. A last term was the intercept. To control the artefact due to dependence of the correlation pairs sharing the same subject, we included crossed random effects (i.e., row-wise and column-wise random effects) for the intercept, conflict similarity, orientation and the group factors (G. Chen et al., 2017).”

      Results from this approach highly replicated our original results. Specifically, we found the right 8C again showed a strong conflict similarity effect, a higher representational strength in the incongruent condition compared to the congruent condition, and a significant correlation with the behavioral CSE. The orientation effect was also identified in the visual (e.g., right V1) and oculomotor (e.g., left FEF) regions.

      We revised the results accordingly:

      For the conflict type effect:

      “The first criterion revealed several cortical regions encoding the conflict similarity, including the Brodmann 8C area (a subregion of dlPFC(Glasser et al., 2016)) and a47r in the right hemisphere, and the superior frontal language (SFL) area, 6r, 7Am, 24dd, and ventromedial visual area 1 (VMV1) areas in the left hemisphere (Bonferroni corrected ps < 0.0001, one-tailed, Fig. 4A). We next tested whether these regions were related to cognitive control by comparing the strength of conflict similarity effect between incongruent and congruent conditions (criterion 2). Results revealed that the left SFL, left VMV1, and right 8C met this criterion, Bonferroni corrected ps < .05, one-tailed, suggesting that the representation of conflict type was strengthened when conflict was present (e.g., Fig. 4D). The intersubject brain-behavioral correlation analysis (criterion 3) showed that the strength of conflict similarity effect on RSM scaled with the modulation of conflict similarity on the CSE (slope in Fig. S2C) in right 8C (r = .52, Bonferroni corrected p = .002, onetailed, Fig. 4C, Table 1) but not in the left SFL and VMV1 (all Bonferroni corrected ps > .05, one-tailed). “

      For the orientation effect:

      “We observed increasing fMRI representational similarity between trials with more similar orientations of stimulus location in the occipital cortex, such as right V1, right V2, right V4, and right lateral occipital 2 (LO2) areas (Bonferroni corrected ps < 0.0001). We also found the same effect in the oculomotor related region, i.e., the left 997 frontal eye field (FEF), and other regions including the right 5m, left 31pv and right parietal area F (PF) (Fig. 5A). Then we tested if any of these brain regions were related to the conflict representation by comparing their encoding strength between incongruent and congruent conditions. Results showed that the right V1, right V2, left FEF, and right PF encoded stronger orientation effect in the incongruent than the congruent condition, Bonferroni corrected ps < .05, one-tailed (Table1, Fig. 5B). We then tested if any of these regions was related to the behavioral performance, and results showed that none of them positively correlated with the behavioral conflict similarity modulation effect, all uncorrected ps > .45, one-tailed. Thus all regions are consistent with the criterion 3.”

      “Note S7. The cross-subject RSA captures similar effects with the within-subject RSA Considering the variability in voxel-level functional localizations among individuals, one may question whether the cross-subject RSA results were biased by the consistent multi-voxel patterns across subjects, distinct from the more commonly utilized withinsubject RSA. We reasoned that the cross-subject RSA should have captured similar effects as the within-subject RSA if we observe the conflict similarity effect in right 8C with the latter analysis. Therefore, we tested whether the representation in right 8C held for within-subject data. Specifically, we performed similar RSA for withinsubject RSMs, excluding the within-run cells. We replaced the perfectly confounded factors of conflict similarity and orientation with a common factor called similarity_orientation. Other confounding factor pairs (i.e., target versus response, and Stroop distractor versus Simon distractor) were addressed similarly. Results showed a significant effect of similarity_orientation, t(13993) = 3.270, p = .0005, 1tailed. Given the specific representation of conflict similarity identified by the crosssubject RSA, the within-subject data of right 8C may show similar conflict similarity modulation effects as the cross-subject data. Further research is needed to fully dissociate the representation of conflict and the representation of visual features such as orientation.”

      8) Another potential source of bias is in treating the subject-level random effect coefficients (as predicted by the mixed-level model) as independent samples from a random variable (in the t-tests). The more standard method for inference would be to use test statistics derived from the mixed-model fixed effects, as those have degrees of freedom calculations that are calibrated based on statistical theory.

      In our revised manuscript, we reported the statistical p values calculated from the mixed-effect models. Note that because we used the Chen et al. (2017) method, which includes data from the symmetric matrix, we corrected the degrees of freedom and estimated the true p values based on the t statistics of model results. For the I versus C comparison results, we calculated the p values by combining I and C RSMs into a larger model and then adding the condition type, as well as the interaction between the regressors of interest (conflict similarity and orientation) and the condition type. We made the statistical inference based on the interaction effect.

      We have revised the corresponding methods as:

      “The statistical significance of these beta estimates was based on the outputs of the mixed-effect model estimated with the “fitlme” function in Matlab 2022a. Since symmetric cells from the RSM matrix were included in the mixed-effect model, we adjusted the t and p values with the true degree of freedom, which is half of the cells included minus the number of fixed regressors. Multiple comparison correction was applied with the Bonferroni approach across all cortical regions at the p < 0.0001 level. To test if the representation strengths are different between congruent and incongruent conditions, we also conducted the RSA using only congruent (RDM_C) and incongruent (RDM_I) trials separately. The contrast analysis was achieved by an additional model with both RDM_C and RDM_I included, adding the congruency and the interaction between conflict type (and orientation) and congruency as both fixed and random factors. The difference between incongruent and congruent representations was indicated by a significant interaction effect.”

      Reviewer #3:

      Yang and colleagues investigated whether information on two task-irrelevant features that induce response conflict is represented in a common cognitive space. To test this, the authors used a task that combines the spatial Stroop conflict and the Simon effect. This task reliably produces a beautiful graded congruency sequence effect (CSE), where the cost of congruency is reduced after incongruent trials. The authors measured fMRI to identify brain regions that represent the graded similarity of conflict types, the congruency of responses, and the visual features that induce conflicts.

      Using several theory-driven exclusion criteria, the authors identified the right dlPFC (right 8C), which shows 1) stronger encoding of graded similarity of conflicts in incongruent trials and 2) a positive correlation between the strength of conflict similarity type and the CSE on behavior. The dlPFC has been shown to be important for cognitive control tasks. As the dlPFC did not show a univariate parametric modulation based on the higher or lower component of one type of conflict (e.g., having more spatial Stroop conflict or less Simon conflict), it implies that dissimilarity of conflicts is represented by a linear increase or decrease of neural responses. Therefore, the similarity of conflict is represented in multivariate neural responses that combine two sources of conflict.

      The strength of the current approach lies in the clear effect of parametric modulation of conflict similarity across different conflict types. The authors employed a clever cross-subject RSA that counterbalanced and isolated the targeted effect of conflict similarity, decorrelating orientation similarity of stimulus positions that would otherwise be correlated with conflict similarity. A pattern of neural response seems to exist that maps different types of conflict, where each type is defined by the parametric gradation of the yoked spatial Stroop conflict and the Simon conflict on a similarity scale. The similarity of patterns increases in incongruent trials and is correlated with CSE modulation of behavior.

      We would like to thank the reviewer for the positive evaluation of our manuscript and for providing constructive comments. By addressing these comments, we believe that we have made our manuscript more accessible for the readers while also strengthening our findings. In particular, we have tested a few alternative models and confirmed that the cognitive space hypothesis best fits the data. We have also demonstrated the geometric properties of the cognitive space by examining the continuity and dimensionality of the space, further supporting our main arguments. We have incorporated revisions and additional analyses to the manuscript based on your feedback. Overall, we believe that these changes and additional analyses have significantly improved the manuscript. Please find our detailed responses below.

      However, several potential caveats need to be considered.

      1) One caveat to consider is that the main claim of recruitment of an organized "cognitive space" for conflict representation is solely supported by the exclusion criteria mentioned earlier. To further support the involvement of organized space in conflict representation, other pieces of evidence need to be considered. One approach could be to test the accuracy of out-of-sample predictions to examine the continuity of the space, as commonly done in studies on representational spaces of sensory information. Another possible approach could involve rigorously testing the geometric properties of space, rather than fitting RSM to all conflict types. For instance, in Fig 6, both the organized and domain-specific cognitive maps would similarly represent the similarity of conflict types expressed in Fig1c (as evident from the preserved order of conflict types). The RSM suggests a low-dimensional embedding of conflict similarity, but the underlying dimension remains unclear.

      Following the reviewer’s first suggestion, we conducted a leave-one-out prediction approach to examine the continuity of the cognitive space. We used the behavioral data from Experiment 1 for this test, due to its larger amount of data than Experiment 2. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model as reported in the main text (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level at subject level. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001. We have added this analysis and result to the “Conflict type similarity modulated behavioral congruency sequence effect (CSE)” 1079 section:

      “Moreover, to test the continuity and generalizability of the similarity modulation, we conducted a leave-one-out prediction analysis. We used the behavioral data from Experiment 1 for this test, due to its larger amount of data than Experiment 2. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level for each subject. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001.”

      To estimate if the domain-specific model could explain the results we observed in right 8C, we conducted a model-comparison analysis. The domain-specific model treats each conflict type differently, so we used a diagonal matrix, with within-conflict type similarities being 1 and all cross-conflict type similarities being 0. This model showed non-significant effects (t(951989) = 0.84, p = .201) and poorer fit (BIC = 5377127) than the cognitive space model (t(951989) = 5.60, p = 1.1×10−8, BIC = 5377094). We also compared other alternative models and found the cognitive space model best fitted the data. We have included these results in the revised manuscript:

      “To examine if the right 8C specifically encodes the cognitive space rather than the domain-general or domain-specific organizations, we tested several additional models (see Methods). Model comparison showed a lower BIC in the Cognitive-Space model (BIC = 5377094) than the Domain-General (BIC = 537127) or Domain-Specific (BIC = 537127) models. Further analysis showed the dimensionality of the representation in the right 8C was 1.19, suggesting the cognitive space was close to 1D. We also tested if the observed conflict similarity effect was driven solely by spatial Stroop or Simon conflicts, and found larger BICs for the models only including the Stroop similarity (i.e., the Stroop-Only model, BIC = 5377122) or Simon similarity (i.e., the Simon-Only model, BIC = 5377096). An additional Stroop+Simon model, including both StroopOnly and Simon-Only regressors, also showed a worse model fitting (BIC = 5377118). Moreover, we replicated the results with only incongruent trials, considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials). We found a poorer fitting in Domain-general (BIC = 1344129), Domain-Specific (BIC = 1344129), Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. The more detailed model comparison results are listed in Table 2.”

      We also estimated the dimensionality of the right 8C with the averaged RSM and found the dimensionality of the cognitive space was ~ 1.19, very close to a 1D space. This result is consistent with our experimental design, as the only manipulated variable is the angular distance between conflict types. We have added these results and the methods to the revised manuscript.

      Results:

      “Further analysis showed the dimensionality of the representation in the right 8C was 1.19, suggesting the cognitive space was close to 1D.”

      Methods:

      “To better capture the dimensionality of the representational space, we estimated its dimensionality using the participation ratio (Ito & Murray, 2023). Since we excluded the within-subject cells from the whole RSM, the whole RSM is an incomplete matrix and could not be used. To resolve this issue, we averaged the cells corresponding to each pair of conflict types to obtain an averaged 5×5 RSM matrix, similar to the matrix shown in Fig. 1C. We then estimated the participation ratio using the formula:

      where λi is the eigenvalue of the RSM and m is the number of eigenvalues.

      2) Another important factor to consider is how learning within the confined task space, which always negatively correlates the two types of conflicts within each subject, may have influenced the current results. Is statistical dependence of conflict information necessary to use the organized cognitive space to represent conflicts from multiple sources? Answering this question would require a paradigm that can adjust multiple sources of conflicts parametrically and independently. Investigating such dependencies is crucial in order to better understand the adaptive utility of the observed cognitive space of conflict similarity.

      As the central goal of our design was to test the geometry of neural representations of conflict, we manipulated the conflict similarity. The anticorrelated Simon and spatial Stroop conflict aimed to make the overall magnitude of conflict similar among different conflict types. We agree that with the current design the likely cognitive space is not a full 2D space with Simon and spatial Stroop being two dimensions. Instead, the likely cognitive space is a subspace (e.g., a circle) embedded in the 2D space, due to the constraint of anticorrelated Simon and spatial Stroop conflict across conflict types. Nevertheless, the subspace can also be used to test the geometry that similar conflict types share similar neural representations.

      To test the full 2D cognitive space, a possible revision of our current design is to have multiple hybrid conditions (like Type 2-4) that cover the whole space. For instance, imagine arrow locations in the first quadrant space. We could have a 3×3 design with 9 conflict conditions, where their horizontal/vertical coordinates could be one of the combinations of 0, 0.5 and 1. This way, the spatial Stroop and Simon conditions would be independent of each other. Notably, however, one potential confounding factor would be that these conditions have different levels of difficulty (i.e., different magnitude of conflict), which may affect the CSE results and their representational similarity.<br /> We have added the above limitations and future designs to the revised 1156 manuscript.

      “Another limitation is that in our design, the spatial Stroop and Simon effects are highly anticorrelated. This constraint may make the five conflict types represented in a unidimensional space (e.g., a circle) embedded in a 2D space. Future studies may test the 2D cognitive space with fully independent conditions. A possible improvement to our current design would be to include left, right, up, and down arrows presented in a grid formation across four spatially separate quadrants, with each arrow mapped to its own response button. However, one potential confounding factor would be that these conditions have different levels of difficulty (i.e., different magnitude of conflict), which may affect the CSE results and their representational similarity.”

      Major comments:

      3) The RSM result (and the absence of univariate effect) seem to be a good first step to claim the use of cognitive space of conflict. Yet, the presence of an organized (unidimensional; Fig. 6) and continuous cognitive space should be further tested and backed up.

      We thank the reviewer for recognizing the methods and results of our current work. Indeed, the utilization of a parametric design and RSA to examine organization of neural representations is a widely embraced methodology in the field of cognitive neuroscience (e.g., Freund et al., 2021; Ritz et al., 2022). Our current study aimed primarily to provide original evidence for whether similar conflicts are represented similarly in the brain, which reflects the geometry of conflict representations (i.e., the structure of differences between conflict representations). We have used multiple criteria to back up the findings by showing the representation is sensitive to the presence of conflict and has behavioral relevance.

      We agree that the cognitive space account of cognitive control requires further validation. Therefore, in the revised manuscript, we have added several additional tests to strengthen the evidence supporting the organized cognitive space representation. Firstly, we tested five alternative models (Domain-General, Domain Specific, Stroop-Only, Simon-Only and Stroop+Simon models), and found that the Cognitive-Space model best fitted our data. Secondly, we explicitly calculated the dimensionality of the representation and observed a low dimensionality (1.19D). We have added these results to the “Multivariate patterns of the right dlPFC encodes the conflict similarity” section in the revised manuscript (see also the response to Comment 1).

      Furthermore, we utilized data from Experiment 1 to demonstrate the continuity of the cognitive space by showing its ability to predict out-of-sample data. We have included this result to the “Conflict type similarity modulated behavioral congruency sequence effect (CSE)” section in the revised manuscript:

      “Moreover, to test the continuity and generalizability of the similarity modulation, we conducted a leave-one-out prediction analysis. We used the behavioral data from Experiment 1 for this test, due to its larger amount of data than Experiment 2. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level for each subject. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001.”

      References:

      Freund, M. C., Bugg, J. M., & Braver, T. S. (2021). A Representational Similarity Analysis of Cognitive Control during Color-Word Stroop. Journal of Neuroscience, 41(35), 7388-7402.

      Ritz, H., & Shenhav, A. (2022). Humans reconfigure target and distractor processing to address distinct task demands. bioRxiv. doi:10.1101/2021.09.08.459546

      4) Is the conflict similarity effect not driven by either coding of the weak to strong gradient of the spatial Stroop conflict or the Simon conflict? For example, would simply identifying brain regions that selectively tuned to the Simon conflict continuously enough to create a graded similarity in Fig. C.

      We recognize that our current design and analyzing approach cannot fully exclude the possibility that the current results are driven solely by either Stroop or Simon conflicts, since their gradients are correlated to the conflict similarity gradient we defined. To estimate their unique contributions, we performed a model-comparison analysis. We constructed a Stroop-Only model and a Simon-Only model, with each conflict type projected onto the Stroop (vertical) axis or Simon (horizontal) axis, respectively. The similarity between any two conflict types was defined using the Jaccard similarity index (Jaccard, P., 1901), that is, their intersection divided by their union. By replacing the cognitive space-based conflict similarity regressor with the Stroop-Only and Simon-Only regressors, we calculated their BICs. Results showed that the BIC was larger for Stroop-Only (5377122) and Simon-Only (5377096) than for the cognitive space model (5377094). An additional Stroop+Simon model, including both Stroop-Only and Simon-Only regressors, also 1220 showed a poorer model fitting (BIC = 5377118) than the cognitive space model.

      Moreover, we replicated the results with only incongruent trials. We found a poorer fitting in Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. Therefore, we believe the cognitive space has incorporated both dimensions. We added these additional analyses and results to the revised manuscript (see also the response to the above Comment 1).

      5) Is encoding of conflict similarity in the unidimensional organized space driven by specific requirements of the task or is this a general control strategy? Specifically, is the recruitment of organized space something specific to the task that people are trained to work with stimuli that negatively correlate the spatial Stroop conflict and the Simon conflict?

      We argue that this encoding is a general control strategy. In our task design, we asked the participants to respond to the target arrow and ignore the location that appeared randomly for them. So, they were not trained to deal with the stimuli in any certain way. We also found the conflict similarity modulation on CSE did not change with more training (We added this result in Note S3), indicating that the cognitive space did not depend on strategies that could be learned through training.

      “Note S3. Modulation of conflict similarity on behavioral CSEs does not change across time We tested if the conflict similarity modulation on the CSE is susceptible to training. We collected the data of Experiment 1 across three sessions, thus it is possible to examine if the conflict similarity modulation effect changes across time. To this end, we added conflict similarity, session and their interaction into a mixed-effect linear model, in which the session was set as a categorical variable. With a post-hoc analysis of variance (ANOVA), we calculated the statistical significance of the interaction term.

      This approach was applied to both the RT and ER. Results showed no interaction effect in either RT, F(2,1479) = 1.025, p = .359, or ER, F(2,1479) = 0.789, p = .455. This result suggests that the modulation effect does not change across time."

      Instead, the cognitive space should be determined by the intrinsic similarity structure of the task design. A previous study (Freitas et al., 2015) has found that the CSE across different versions of spatial Stroop and flanker tasks was stronger than that across either of the two conflicts and Simon. In their designs, the stimulus similarity was controlled at the same level, so the difference in CSE was only attributable to the similar dimensional overlap between Stroop and flanker tasks, in contrast to the Simon task. Furthermore, recent studies showed that the cognitive space generally exists to represent structured latent states (e.g., Vaidya et al., 2022), mental strategy cost (Grahek et al., 2022), and social hierarchies (Park et al., 2020). Therefore, we argue that cognitive space is likely a universal strategy that can be applied to different scenarios.

      We added this argument in the discussion:

      “Although the spatial orientation information in our design could be helpful to the construction of cognitive space, the cognitive space itself was independent of the stimulus-level representation of the task. We found the conflict similarity modulation on CSE did not change with more training (see Note S3), indicating that the cognitive space did not depend on strategies that could be learned through training. Instead, the cognitive space should be determined by the intrinsic similarity structure of the task design. For example, a previous study (Freitas et al, 2015) has found that the CSE across different versions of spatial Stroop and flanker tasks was stronger than that across either of the two conflicts and Simon. In their designs, the stimulus similarity was controlled at the same level, so the difference in CSE was only attributable to the similar dimensional overlap between Stroop and flanker tasks, in contrast to the Simon task. Furthermore, recent studies showed that the cognitive space generally exists to represent structured latent states (e.g., Vaidya et al., 2022), mental strategy cost (Grahek et al., 2022), and social hierarchies (Park et al., 2020). Therefore, cognitive space is likely a universal strategy that can be applied to different scenarios."

      Reference:

      Freitas, A. L., & Clark, S. L. (2015). Generality and specificity in cognitive control: conflict adaptation within and across selective-attention tasks but not across selective-attention and Simon tasks. Psychological Research, 79(1), 143-162.

      Vaidya, A. R., Jones, H. M., Castillo, J., & Badre, D. (2021). Neural representation of 1280 abstract task structure during generalization. Elife, 10, 1-26.

      Grahek, I., Leng, X., Fahey, M. P., Yee, D., & Shenhav, A. Empirical and 1282 Computational Evidence for Reconfiguration Costs During Within-Task 1283 Adjustments in Cognitive Control. CogSci.

      Park, S. A., Miller, D. S., Nili, H., Ranganath, C., & Boorman, E. D. (2020). Map 1285 Making: Constructing, Combining, and Inferring on Abstract Cognitive Maps. 1286 Neuron, 107(6), 1226-1238 e1228. doi:10.1016/j.neuron.2020.06.030

      6) The observed pattern seems to suggest that there is conflict similarity space that is defined by the combination of the conflict similarity (i.e., the strength of conflicts) and the sources of conflict (i.e., the Simon vs the spatial Stroop). What are the rational reasons to separate conflicts of different sources (beyond detecting incongruence)? And how are they used for better conflict resolutions?

      The necessity of separating conflicts of different sources lies in that the spatial Stroop and the Simon effects are resolved with different mechanisms. The behavioral congruency effects of a combined conflict from two different sources were shown to be the summation of the two conflict sources (Liu et al., 2010), suggesting that the conflicts are resolved independently. Moreover, previous studies have shown that different sources of conflict are resolved with different brain regions (Egner, 2008; Li et al., 2017), and at different processing stages (Wang et al., 2013). Therefore, when multiple sources of conflict occur simultaneously or sequentially, it should be more efficient to resolve the conflict by identifying the sources.

      We have added this argument to the revised manuscript:

      “The rationale behind defining conflict similarity based on combinations of different conflict sources, such as spatial-Stroop and Simon, stems from the evidence that these sources undergo independent processing (Egner, 2008; Li et al., 2014; Liu et al., 2010; Wang et al., 2014). Identifying these distinct sources is critical in efficiently resolving potentially infinite conflicts."

      Reference:

      Egner, T. (2008). Multiple conflict-driven control mechanisms in the human brain. Trends in Cognitive Sciences, 12(10), 374-380.

      Li, Q., Yang, G., Li, Z., Qi, Y., Cole, M. W., & Liu, X. (2017). Conflict detection and 1307 resolution rely on a combination of common and distinct cognitive control networks. Neuroscience and Biobehavioral Reviews, 83, 123-131.

      Wang, K., Li, Q., Zheng, Y., Wang, H., & Liu, X. (2014). Temporal and spectral 1310 profiles of stimulus-stimulus and stimulus-response conflict processing. NeuroImage, 89, 280-288.

      Liu, X., Park, Y., Gu, X., & Fan, J. (2010). Dimensional overlap accounts for independence and integration of stimulus-response compatibility effects. Attention, Perception, & Psychophysics, 72(6), 1710-1720.

      7) The congruency effect is larger in conflict type 2, 3, 4 consistently compared to conflict 1 and 5. Are these expected under the hypothesis of unified cognitive space of conflict similarity? Is the pattern of similarity modeled in RSA?

      Yes, this is expected. The spatial Stroop and Simon effects have been shown to be additive and independent (Li et al., 2014). Therefore, the congruency effects of conflict type 2, 3 and 4 would be the weighted sum of the spatial Stroop and Simon effects. The weights can be defined by the sine and cosine of the polar angle.

      For instance, in Type 2, wy = sin(67.5°) and wx = cos(67.5°). The sum of the two 1321 weight values (i.e., 1.31) is larger than 1, leading to a larger congruency effect than 1322 the pure spatial Stroop (Conf 1) and Simon (Conf 5) conditions.

      Note that this hypothesis underlies the Stroop+Simon model, which assumes the Stroop and Simon dimensions are independently represented in the brain and drive the behavior in an additive fashion. Moreover, the observed difference of behavioral congruency effects may have reflected the variance in the Domain-General model, which treats all conflict types as equivalent, with the only difference between each two conflict types in the magnitude of their conflict. Therefore, we did not model the behavioral congruency effects as a covariance regressor in the major RSA. Instead, we conducted a model comparison analysis by comparing these models and the Cognitive-Space model. Results showed worse model fitting of both the Domain-general and Stroop+Simon models. Specially, the regressor of congruency effect difference in the Domain-General model was not significant (p = .575), which also suggests that the higher congruency effect in conflict type 2, 3 and 4 should not influence the Cognitive-Space model results. We have added these methods and results to the revised manuscript (see also our response to Comment 1):

      Methods:

      “Model comparison and representational dimensionality

      To estimate if the right 8C specifically encodes the cognitive space, rather than the domain-general or domain-specific structures, we conducted two more RSAs. We replaced the cognitive space-based conflict similarity matrix in the RSA we reported above (hereafter referred to as the Cognitive-Space model) with one of the alternative model matrices, with all other regressors equal. The domain-general model treats each conflict type as equivalent, so each two conflict types only differ in the magnitude of their conflict. Therefore, we defined the domain-general matrix as the difference in their congruency effects indexed by the group-averaged RT in Experiment 2. Then the z scored model vector was sign-flipped to reflect similarity instead of distance. The domain-specific model treats each conflict type differently, so we used a diagonal matrix, with within-conflict type similarities being 1 and all cross-conflict type similarities being 0.

      Moreover, to examine if the cognitive space is driven solely by the Stroop or Simon conflicts, we tested a spatial Stroop-Only (hereafter referred to as “Stroop-Only”) and a Simon-Only model, with each conflict type projected onto the spatial Stroop (vertical) axis or Simon (horizontal) axis, respectively. The similarity between any two conflict types was defined using the Jaccard similarity index (Jaccard, 1901), that is, their intersection divided by their union. We also included a model assuming the Stroop and Simon dimensions are independently represented in the brain, adding up the Stroop Only and Simon-Only regressors. We conducted similar RSAs as reported above, replacing the original conflict similarity regressor with the Strrop-Only, Simon-Only, or both regressors, and then calculated their Bayesian information criterions (BICs)."

      Reference:

      Li, Q., Nan, W., Wang, K., & Liu, X. (2014). Independent processing of stimulus stimulus and stimulus-response conflicts. PloS One, 9(2), e89249.

      8) Please clarify the observed patterns of CSE effects in relation to the hypothesis of common cognitive space of conflict. In particular, right 8C shows that the patterns become dissimilar in incongruent trials compared to congruent trials. How does this direction of the effect fit to the common unidimensional cognitive space account? And how does such a representation contribute to the CES effects?

      The behavioral CSE patterns provide initial evidence for the cognitive space hypothesis. Previous studies have debated whether cognitive control relies on domain-general or domain-specific representations, with much evidence gathered from behavioral CSE patterns. A significant CSE across two conflict conditions typically suggests domain-general representations of cognitive control, while an absence of CSE suggests domain-specific representations. The cognitive space view proposes that conflict representations are neither purely domain-general nor purely domain-specific, but rather exist on a continuum. This view predicts that the CSE across two conflict conditions should depend on the representational distance between them within this cognitive space. Our finding that CSE values systematically vary with conflict similarity level support this hypothesis. We have added this point in the discussion of the revised manuscript:

      “Previous research on this topic often adopts a binary manipulation of conflict(Braem et al., 2014) (i.e., each domain only has one conflict type) and gathered evidence for the domain-general/specific view with presence/absence of CSE, respectively. Here, we parametrically manipulated the similarity of conflict types and found the CSE systematically vary with conflict similarity level, demonstrating that cognitive control is neither purely domain-general nor purely domain-specific, but can be reconciled as a cognitive space(Bellmund et al., 2018) (Fig. 6, middle).

      Fig. 4D was plotted to show the steeper slope of the conflict similarity effect for incongruent versus congruent conditions. Note the y-aixs displays z-scored Pearson correlation values, so the grand mean of each condition was 0. The values for the first two similarity levels (level 1 and 2) were lower for incongruent than congruent conditions, seemingly indicating lower average similarity. However, this was not the case. The five similarity levels contained different numbers of data points (see Fig. 1C), so levels 4 and 5 should be weighted more heavily than levels 1 and 2. When comparing the grand mean of raw Pearson correlation values, the incongruent condition (0.0053) showed a tendency toward higher similarity than the congruent condition (0.0040), t(475998) = 1.41, p = .079. We have also plotted another version of Fig. 4D in Fig. S5, in which the raw Pearson correlation values were used.

      The greater representation of conflict type in incongruent condition compared to congruent condition (as evidenced by a steeper slope) suggests that the conflict representation was driven by the incongruent condition. This is probably due to the stronger involvement of cognitive control in incongruent condition (than congruent condition), which in turn leads to more distinct patterns across different conflict types. This is consistent with the fact that the congruent condition is typically a baseline, where any conflict related effects should be weaker.

      The representation of cognitive space may contribute to the CSE as a mental model. This model allows our brain to evaluate the cost and benefit associated with transitioning between different conflict conditions. When two consecutive trials are characterized by more similar conflict types, their representations in the cognitive space will be closer, resulting in a less costly transition. As a consequence, stronger CSEs are observed. We revised the corresponding discussion part as:

      “Similarly, we propose that cognitive space could serve as a mental model to assist fast learning and efficient organization of cognitive control settings. Specifically, the cognitive space representation may provide a principle for how our brain evaluates the expected cost of switching and the benefit of generalization between states and selects the path with the best cost-benefit tradeoff (Abrahamse et al., 2016; Shenhav et al., 2013). The proximity between two states in cognitive space could reflect both the expected cognitive demand required to transition and the useful mechanisms to adapt from. The closer the two conditions are in cognitive space, the lower the expected switching cost and the higher the generalizability when transitioning between them. With the organization of a cognitive space, a new conflict can be quickly assigned a location in the cognitive space, which will facilitate the development of cognitive control settings for this conflict by interpolating nearby conflicts and/or projecting the location to axes representing different cognitive control processes, thus leading to a stronger CSE when following a more similar conflict condition.”

      Minor comments:

      9) Some of the labels of figure axes are unclear (e.g., Fig4C) about what they represent.

      In Fig. 4C, the x-axis label is “neural representational strength”, which refers to the beta coefficient of the conflict type effect computed from the main RSA, denoting the strength of the conflict type representation in neural patterns. The y-axis label is “behavioral representational strength”, which refers to the beta coefficient obtained from the behavioral linear model using conflict similarity to predict the CSE in Experiment 2; it reflects how strong the conflict similarity modulates the behavioral 1440 CSE. We apologize for any confusion from the brief axis labels. We have added expanded descriptions to the figure caption of Fig. 4C.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript analyzes large-scale Neuropixels recordings from visual areas and hippocampus of mice passively viewing repeated clips of a movie and reports that neurons respond with elevated firing activities to specific, continuous sequences of movie frames. The important results support a role of rodent hippocampal neurons in general episode encoding and advance understanding of visual information processing across different brain regions. The strength of evidence for the primary conclusion is solid, but some technical limitations of the study were identified that merit further analyses.

      We thank the editors and reviews for the assessment and reviews. We have provided clarifications and updated the manuscripts to address the seeming technical limitations that are perhaps due to some misunderstanding, please see below. We provide additional results that isolate the contribution of pupil diameter, sharpwave ripple and theta power to show that movie tuning cannot be explained by these nonspecific effects. Nor are these mere time cells or some other internally generated patterns due to many differences highlighted below.

      Reviewer #1 (Public Review):

      Taking advantage of a publicly available dataset, neuronal responses in both the visual and hippocampal areas to passive presentation of a movie are analyzed in this manuscript. Since the visual responses have been described in a number of previous studies (e.g., see Refs. 11-13), the value of this manuscript lies mostly on the hippocampal responses, especially in the context of how hippocampal neurons encode episodic memories. Previous human studies show that hippocampal neurons display selective responses to short (5 s) video clips (e.g. see Gelbard-Sagiv et al, Science 322: 96-101, 2008). The hippocampal responses in head-fixed mice to a longer (30 s) movie as studied in this manuscript could potentially offer important evidence that the rodent hippocampus encodes visual episodes.

      We have now included citations to Gelbard-Sagiv et al. Science 2008 paper and many other references too, thank you for pointing that out. There are major differences between that study and ours.

      a. The movies used in previous study contained very familiar, famous people and famous events, and the experiment was about the patient’s ability to recall those famous movie episodes. In our case the mice had seen this movie clip only in two habituation sessions before.

      b. They did not look at the fine structure of neural responses below half a second whereas we looked at the mega-scale representations from 30ms to 30s.

      c. The movie clips in that study were in full color with audio, we used an isoluminant, black-and-white, silent movie clip.

      d. Their movie clips contained humans and was observed by humans, whereas our study mice observed a movie clip with humans and no mice or other animals.

      The analysis strategy is mostly well designed and executed. A number of factors and controls, including baseline firing, locomotion, frame-to-frame visual content variation, are carefully considered. The inclusion of neuronal responses to scrambled movie frames in the analysis is a powerful method to reveal the modulation of a key element in episodic events, temporal continuity, on the hippocampal activity. The properties of movie fields are comprehensively characterized in the manuscript.

      Thank you.

      Although the hippocampal movie fields appear to be weaker than the visual ones (Fig. 2g, Ext. Fig. 6b), the existence of consistent hippocampal responses to movie frames is supported by the data shown. Interestingly, in my opinion, a strong piece of evidence for this is a "negative" result presented in Ext. Fig. 13c, which shows higher than chance-level correlations in hippocampal responses to same scrambled frames between even and odd trials (and higher than correlations with neighboring scrambled frames). The conclusion that hippocampal movie fields depend on continuous movie frames, rather than a pure visual response to visual contents in individual frames, is supported to some degree by their changed properties after the frame scrambling (Fig. 4).

      Yes, hippocampal selectivity is not entirely abolished with scrambled movie, as we show in several figures (Figure 4d,g and Figure 4- figure supplement 6), but it is greatly reduced, far more than that in the afferent visual cortices. The fraction of tuned cells for scrambled movies dropped to 4.5% in hippocampus, which is close to the chance level of 3%. In contrast, in visual areas selectivity was still above 80%.

      Significant overlap between even and odd trials is to be expected for the tuned cells. Without a significant overlap, i.e. a stable representation, they will not be tuned. Despite this, the correlation between even and odd trials for the (only 4.5% of) tuned cells in the hippocampus was more than 2-fold smaller than (more than 80% of) cells in visual cortices. This strongly supports our hypothesis that unlike visual cortices, hippocampal subfields depended very strongly on the continuity of visual information. We have now clarified this in the main text.

      However, there are two potential issues that could complicate this main conclusion.

      One issue is related to the effect of behavioral variation or brain state. First, although the authors show that the movie fields are still present during low-speed stationary periods, there is a large drop in the movie tuning score (Z), especially in the hippocampal areas, as shown in Ext. Fig. 3b (compared to Ext. Fig. 2d). This result suggests a potentially significant enhancement by active behavior.

      There seems to be some misunderstanding here. There was no major reduction in movie tuning during immobility or active running. As we wrote in the manuscript, the drop in selectivity during purely immobile epochs is because of reduction in the amount of data, not reduction in selectivity per se. Specifically, as the amount data reduces, the statistical strength of tuning (z-scored sparsity) reduces. For example, if we split the total of 60 trials worth of data into two parts, the amount of data reduces to about half in each part, leading to a seeming reduction in selectivity in both halves. Figure 1-figure supplement 4c shows nearly identical tuning in all brain regions during immobility (red bars) and equivalent subsamples (yellow-orange) chosen randomly from the entire data, including mobility and immobility. We also show that the movie tuning persists in sessions with and without prolonged running behavior (Figure 1-figure supplement 7), as well as by splitting the data based on pupil dilation or theta power. Please see below for more details.

      Second, a general, hard-to-tackle concern is that neuronal responses could be greatly affected by changes in arousal or brain state (including drowsy or occasional brief slow-wave sleep state) in head-fixed animals without a task. Without the analysis of pupil size or local field potentials (LFPs), the arousal states during the experiment are difficult to know.

      In the revised manuscript we show that the behavioral state effects cannot explain movie tuning. Specifically:

      a. We compared sessions in which the mouse was mostly immobile versus sessions in which the mouse was mostly running. Movie tuned cells were found in both these cases (Figure 1-figure supplement 7).

      b. We detected and removed all data around sharp-wave ripples (SWR). Movie tuning was unchanged in the remaining data. (Figure 1-figure supplement 6).

      c. As a further control, we quantified arousal by two standard metrics. First within a session, we split the data into two groups, segments with high theta power and segments with low theta power. Significant movie tuning persisted in both.

      d. Finally, pupil dilation is another common method to estimate arousal, so data within a session were split into two parts: those with pupil dilation versus constriction. Movie tuning remained significant in both parts. See the new Figure 1-figure supplement 7.

      Many example movie fields in the presented raw data (e.g., Fig. 1c, Ext. Fig. 4) are broad with low-quality tuning, which could be due to broad changes in brain states. This concern is especially important for hippocampal responses, since the hippocampus can enter an offline mode indicated by the occurrence of LFP sharp-wave ripples (SWRs) while animals simply stay immobile. It is believed that the ripple-associated hippocampal activity is driven mainly by internal processing, not a direct response to external input (e.g., Foster and Wilson, Nature 440: 680, 2006). The "actual" hippocampal movie fields during a true active hippocampal network state, after the removal of SWR time periods, could have different quantifications that impact the main conclusion in the manuscript.

      We included the broadly tuned hippocampal neurons to demonstrate the movie-field broadening compared to those in visual areas. We now include more examples with sharp movie fields in the hippocampal regions (Figure 1a-d right column, 2d and h, Figure 1-figure supplement 5 and Figure 2-figure supplement 1). Further, as stated above, we detected sharp-wave ripples and removed one second of data around SWR. Movie tuning was unchanged in the remaining data. Thus, movie tuning is not generated internally via SWR (Figure 1-figure supplement 6). See also Figure 1-figure supplement 7 and Figure 2-figure supplement 8 and the response above.

      Another issue is related to the relative contribution of direct visual response versus the response to temporal continuity in movie fields. First, the data in Ext. Fig. 8 show that rapid frame-to-frame changes in visual contents contribute largely to hippocampal movie fields (similarly to visual movie fields).

      There seems to be some misunderstanding here. That figure showed that the frame-to-frame changes in the visual content had the highest effect on visual areas MSUA and much weaker in hippocampus (Extended Data Fig. 8, as per previous version, now Figure3-figure supplement 2). For example, the depth of modulation (max – min) / (max + min) for MSUA was 21% and 24% for V1 but below 6% for hippocampal regions. Similarly, the MSUA was more strongly (negatively) correlated with F2F correlation for visual areas (r=0.48 to 0.56) than hippocampal (0.07 to 0.3). Similarly, comparing the number of peaks or their median widths, visual regions showed stronger correlation with F2F, and largest depth of modulation than hippocampal regions, barring handful exceptions (like CA3 correlation between F2F and median peak duration). This strongly supports our claim that visual regions generated far greater response of the frame-to-frame changes in the movie than hippocampal regions.

      Interestingly, the data show that movie-field responses are correlated across all brain areas including the hippocampal ones.

      In Figure 3c we compared the MSUA responses with normalization between brain regions. Amongst the 21 possible brain region pairs, 5 were uncorrelated, 7 were significantly negatively correlated and 9 were significantly positively correlated.

      The changes in population overlap, number and widths of peaks are strongly correlated only between visual areas and some of the hippocampal region pairs. The correlation is much weaker for hippocampal-visual area pairs, but often significantly different from chance. This is quantified explicitly in the revised text Figure 3-figure supplement 2 with an additional correlation matrix at the right.

      This could be due to heightened behavioral arousal caused by the changing frames as mentioned above, or due to enhanced neuronal responses to visual transients, which supports a component of direct visual response in hippocampal movie fields.

      As shown in Figure 1-figure supplements 4,5,6 and 7 and described above, the effect of arousal as quantified by theta power of pupil diameter (or by accounting for running behavior or SWR occurrences) cannot explain the results in hippocampal areas and the correlations in multiunit responses are unrelated across many brain areas.

      Second, the data in Ext. Fig. 13c show a significant correlation in hippocampal responses to same scrambled frames between even and odd trials, which also suggests a significant component of direct visual response.

      This is plausible. The fraction of hippocampal cells which were significantly tuned for the scrambled presentation (4.5%) was close to chance level (3%), and this small subset of cells was used to compute the population overlap between even and odd trials in Figure 4-figure supplement 6 (Ext Fig. 13 with old numbering). As described above, this significant but small amount of tuning could generate significant population overlap, which is to be expected by construction.

      Is there a significant component purely due to the temporal continuity of movie frames in hippocampal movie fields? To support that this is indeed the case, the authors have presented data that hippocampal movie fields largely disappear after movie frames are scrambled. However, this could be caused by the movie-field detection method (it is unclear whether single-frame field could be detected).

      As described in the methods section, the movie-field detection algorithm had a resolution of 3.3ms resolution, which ensured that we could detect single frame fields. As reported, we did find such short movie fields in several cells in the visual areas. The sparsity metric used is agnostic to the ordering of the responses, and hence single frame field, and the resultant significant movie-tuning, if present, can be detected by our methods.

      Another concern in the analysis is that movie-fields are not analyzed on re-arranged neural responses to scrambled movie frames. The raw data in Fig. 4e seem quite convincing. Unfortunately, the quantifications of movie fields in this case are not compared to those with the original movie.

      We saw very few (3.6-4.9%) cells with significant movie tuning for scrambled presentation in the hippocampus. Hence, we did not quantify this earlier. This is now provided in new Figure 4-figure supplement 5. The amount of movie tuning for the scrambled presentation taken as-is, or after rearranging the frames is below 5% for all hippocampal brain regions and not significantly different between the two.

      Reviewer #2 (Public Review):

      Purandare and Mehta investigated the neural activities modulated by continuous and sequential visual stimuli composed of natural images, termed "movie-tuning," measured along the visuo-hippocampal network when the animals passively viewed a movie without any task demand. Neurons selectively responded to some specific parts of the movie, and their activity timescales ranged from tens of milliseconds to seconds and tiled the entire movie with their movie-fields. The movie-tuning was lost in the hippocampus but not in the visual cortices when the image frames were temporally scrambled, implying that the rodent hippocampus encoded the specific sequence of images.

      The authors have concluded that the neurons in the thalamo-cortical visual areas and the hippocampus commonly encode continuous visual stimuli with their firing fields spanning the mega-scale, but they respond to different aspects of the visual stimuli (i.e., visual contents of the image versus a sequence of the images). The conclusion of the study is fairly supported by the data, but some remaining concerns should be addressed.

      1) Care should be taken in interpreting the results since the animal's behavior was not controlled during the physiological recording.

      This was done intentionally since plenty of research shows that task demand (e.g., Aronov and Tank, Nature 2017) can not only modulate hippocampal responses but also dramatically alter them. We have now provided additional figures (Figure 1-figure supplement 6 and 7) where we quantified the effects of the behavioral states (sharp wave ripples, theta power and pupil diameter), as well as the effect of locomotion (Figure 1-figure supplement 4). Movie tuning remained unaffected with these manipulations. Thus, movie tuning cannot be attributed to behavioral effects.

      It has been reported that some hippocampal neuronal activities are modulated by locomotion, which may still contribute to some of the results in the current study. Although the authors claimed that the animal's locomotion did not influence the movie-tuning by showing the unaltered proportion of movie-tuned cells with stationary epochs only, the effects of locomotion should be tested in a more specific way (e.g., comparing changes in the strength of movie-tuning under certain locomotion conditions at the single-cell level).

      Single cell analysis of the effect of locomotion and visual stimulation is underway, and beyond the scope of the current work. As detailed in Figure 1-figure supplement 4, we have ensured that in spite of the removal of running or stationary epochs, as well as removal of sharp wave ripple events (Figure 1-figure supplement 6) movie tuning persists. Further, we now provide examples of strongly tuned cells from sessions with predominantly running or predominantly stationary behavior (Figure 1-figure supplement 7).

      2) The mega-scale spanning of movie-fields needs to be further examined with a more controlled stimulus for reasonable comparison with the traditional place fields. This is because the movie used in the current study consists of a fast-changing first half and a slow-changing second half, and such varying and ununified composition of the movie might have largely affected the formation of movie-fields. According to Fig. 3, the mega-scale spanning appears to be driven by the changes in frame-to-frame correlation within the movie. That is, visual stimuli changing quickly induced several short fields while persisting stimuli with fewer changes elongated the fields.

      Please note that a strong correlation between the speed at which the movie scene changed across frames was correlated with movie-field width in the visual areas, but that correlation was much weaker in the hippocampal areas (correlation values - (LGN +0.61, V1 +0.51, AM-PM +0.55 vs. DG +0.39, CA3 +0.58, CA1 +0.42, SUB +0.24). Please see Figure 3-figure supplement 2 and the quantification of correlation between frame-to-frame changes in the movie and the properties of movie fields.

      The presentation of persisting visual input for a long time is thought to be similar to staying in one place for a long time, and the hippocampal activities have been reported to manifest in different ways between running and standing still (i.e., theta-modulated vs. sharp wave ripple-based). Therefore, it should be further examined whether the broad movie-fields are broadly tuned to the continuous visual inputs or caused by other brain states.

      As shown in Figure 1-figure supplement 6, movie field properties are largely unchanged when SWR are removed from the data, or when the effect of pupil diameter or theta power were factored for (Figure 1-figure supplement 7).

      3) The population activities of the hippocampal movie-tuned cells in Fig. 3a-b look like those of time cells, tiling the movie playback period. It needs to be clarified whether the hippocampal cells are actively coding the visual inputs or just filling the duration.

      Tiling patterns would be observed when the maxima are sorted in any data, even for random numbers. This alone does not make them time cells. The following observations suggest that movie fields cannot be explained as being time cells.

      a. Time cells mostly cluster at the beginning of a running epoch (Pastalkova et al. Science 2008, MacDonald et al. Neuron 2011) and they taper off towards the end. Such large clustering is not visible in these tiling plots for movie tuned cells.

      b. Time fields become wider as the temporal duration progresses (Pastalkova et al. Science 2008, MacDonald et al. Neuron 2011) as the encoded temporal duration increases. This is not evident in any movie fields.

      c. Widths of movie fields in visual areas, and to a smaller extent in the hippocampal areas, were clearly modulated by the visual content, like the change from one frame to the next (F2F correlation, Figure 3-figure supplement 2).

      d. Tiling pattern of movie fields was found in visual areas too, with qualitatively similar pattern as hippocampus. Clearly, visual area responses are not time cells, as shown by the scrambled stimulus experiment. Here, neural selectivity could be recovered by rearranging them based on the visual content of the continuous movie, and not the passage of time.

      The scrambled condition in which the sequence of the images was randomly permutated made the hippocampal neurons totally lose their selective responses, failing to reconstruct the neural responses to the original sequence by rearrangement of the scrambled sequence. This result indirectly addressed that the substantial portion of the hippocampal cells did not just fill the duration but represented the contents and temporal order of the images. However, it should be directly confirmed whether the tiling pattern disappeared with the population activities in the scrambled condition (as shown in Extended Data Fig. 11, but data were not shown for the hippocampus).

      As stated above for the continuous movie, tiling pattern alone does not mean those are time cells. Further, tuning, and tiling pattern remained intact with scrambled movie in the visual cortices but not in hippocampus. We now added a new supplement figure – Figure 4-figure supplement 5 where we compared the movie tuning for scrambled presentation with and without rearranging the frames. Hippocampal tuning remains at chance levels.

      Reviewer #3 (Public Review):

      In their study, Purandare & Mehta analyze large-scale single unit recordings from the visual system (LGN, V1, extrastriate regions AM and PM) and hippocampal system (DG, CA3, CA1 and subiculum) while mice monocularly viewed repeats of a 30s movie clip. The data were part of a larger release of publicly available recordings from the Allen Brian Observatory. The authors found that cells in all regions exhibited tuning to specific segments of the movie (i.e. "movie fields") ranging in duration from 20ms to 20s. The largest fractions of movie-responsive cells were in visual regions, though analyses of scrambled movie frames indicated that visual neurons were driven more strongly by visual features of the movie images themselves. Cells in the hippocampal system, on the other hand, tended to exhibit fewer "movie fields", which on average were a few seconds in duration, but could range from >50ms to as long as 20s. Unlike the visual system "movie fields" in the hippocampal system disappeared when the frames of the movie were scrambled, indicating that the cells encoded more complex (episodic) content, rather than merely passively reading out visual input.

      The paper is conceptually novel since it specifically aims to remove any behavioral or task engagement whatsoever in the head-fixed mice, a setup typically used as an open-loop control condition in virtual reality-based navigational or decision making tasks (e.g. Harvey et al., 2012). Because the study specifically addresses this aspect of encoding (i.e. exploring effects of pure visual content rather than something task-related), and because of the widespread use of video-based virtual reality paradigms in different sub-fields, the paper should be of interest to those studying visual processing as well as those studying visual and spatial coding in the hippocampal system. However, the task-free approach of the experiments (including closely controlling for movement-related effects) presents a Catch-22, since there is no way that the animal subjects can report actually recognizing or remembering any of the visual content we are to believe they do.

      Our claim is that these are movie scene evoked responses. We make no claims about the animal’s ability to recognize or remember the movie content. That would require entirely different set of experiments. Meanwhile, we have shown that these results are not an artifact of brain states such as sharp wave ripples, theta power or pupil diameter (Figure1-figure supplement 6 and 7) or running behavior (Figure 1-figure supplement 4). Please see above for a detailed response.

      We must rely on above-chance-level decoding of movie segments, and the requirement that the movie is played in order rather than scrambled, to indicate that the hippocampal system encodes episodic content of the movie. So the study represents an interesting conceptual advance, and the analyses appear solid and support the conclusion, but there are methodological limitations.

      It is important to emphasize that these responses could constitute episodic responses but does not prove episodic memory, just as place cell responses constitute spatial responses but that does not prove spatial memory. The link between place cells and place memory is not entirely clear. For example, mice lacking NMDA receptors have intact place cells, but are impaired in spatial memory task (McHugh et al. Cell 1996), whereas spatial tuning was virtually destroyed in mice lacking GluR1 receptors, but they could still do various spatial memory tasks (Resnik et al. J. Neuro 2012).

      The experiments about episodic memory would require an entirely different set of experiments that involve task demand and behavioral response, which in turn would modify hippocampal responses substantially, as shown by many studies. Our hypothesis here, is that just like place cells, these episodic responses without task demand would play a role, to be determined, in episodic memory. We have emphasized this point in the main text (Ln 391-393 in the revised manuscript).

      Major concerns:

      1) A lot hinges on hinges on the cells having a z-scored sparsity >2, the cutoff for a cell to be counted as significantly modulated by the movie. What is the justification of this criterion?

      The z-scored sparsity (z>2) corresponds to p<0.03. This would mean that 3% of the results could appear by chance. Hence, z>2 is a standard method used in many publications. Another advantage of z-scored sparsity is that it is relatively insensitive to the number of spikes generated by a neuron (i.e. the mean firing rate of the neuron and the duration of the experiment). In contrast, sparsity is strongly dependent on the number of spikes which makes it difficult to compare across neurons, brain regions and conditions (See Supplement S5 Acharya et al. Cell 2016).

      To further address this point, we compared our z-scored sparsity measure with 2 other commonly used metrics to quantify neural selectivity, depth of modulation and mutual information (Figure 1-figure supplement 3). Comparable movie tuning was obtained from all 3 metrics, upon z-scoring in an identical fashion.

      It should be stated in the Results. Relatedly, it appears the formula used for calculating sparseness in the present study is not the same as that used to calculate lifetime sparseness in de Vries et al. 2020 quoted in the results (see the formula in the Methods of the de Vries 2020 paper immediately under the sentence: "Lifetime sparseness was computed using the definition in Vinje and Gallant").

      The definition of sparsity we used is used commonly by most hippocampal scientists (Treves and Rolls 1991, Skaggs et al. 1996, Ravassard et al. 2013). Lifetime sparseness equation used by de Vries et al. 2020, differs from us by just one constant factor (1-1/N) where N=900 is the number of frames in the movie. This constant factor equals (1-1/900)=0.999. Hence, there is no difference between the sparsity obtained by these two methods. Further, z-scored sparsity is entirely unaffected by such constant factors. We have clarified this in the methods of the revised manuscript.

      To rule out systematic differences between studies beyond differences in neural sampling (single units vs. calcium imaging), it would be nice to see whether calculating lifetime sparseness per de Vries et al. changed the fraction "movie" cells in the visual and hippocampal systems.

      As stated above, the two definitions of sparsity are virtually identical and we obtained similar results using two other commonly used metrics, which are detailed in Figure 1-figure supplement 3.

      2) In Figures 1, 2 and the supplementary figures-the sparseness scores should be reported along with the raw data for each cell, so the readers can be apprised of what types of firing selectivity are associated with which sparseness scores-as would be shown for metrics like gridness or Raleigh vector lengths for head direction cells. It would be helpful to include this wherever there are plots showing spike rasters arranged by frame number & the trial-averaged mean rate.

      As shown in several papers (Aghajan et al Nature Neuroscience 2015, Acharya et al., Cell 2016) raw sparsity (or information content) are strongly dependent on the number of spikes of a neuron. This makes the raw values of these numbers impossible to compare across cells, brain regions and conditions. (Please see Supplement S5 from Acharya et al., Cell 2016 for details). Including the data of sparsity would thus cause undue confusion. Hence, we provide z-scored sparsity. This metric is comparable across cells and brain regions, and now provided above each example cell in Figure 1 and Figure 1-figure supplement 2.

      3) The examples shown on the right in Figures 1b and c are not especially compelling examples of movie-specific tuning; it would be helpful in making the case for "movie" cells if cleaner / more robust cells are shown (like the examples on the left in 1b and c).

      We did not put the most strongly tuned hippocampal neurons in the main figures so that these cells are representative of the ensemble and not the best possible ones, so as to include examples with broad tuning responses. We have clarified in the legend that these cells are some of the best tuned cells. Although not the cleanest looking, the z-scored sparsity mentioned above the panels now indicates how strongly they are modulated compared to chance levels. Additional examples, including those with sharply tuned responses are shown in Figure 1-figure supplement 5 and Figure 2-figure supplement 1.

      4) The scrambled movie condition is an essential control which, along with the stability checks in Supplementary Figure 7, provide the most persuasive evidence that the movie fields reflect more than a passive readout of visual images on a screen. However, in reference to Figure 4c, can the authors offer an explanation as to why V1 is substantially less affected by the movie scrambling than it's main input (LGN) and the cortical areas immediately downstream of it? This seems to defy the interpretation that "movie coding" follows the visual processing hierarchy.

      This is an important point, one that we find very surprising as well. Perhaps this is related to other surprising observations in our manuscript, such as more neurons appeared to be tuned to the movie than the classic stimuli. A direct comparison between movie responses versus fixed images is not possible at this point due to several additional differences such as the duration of image presentations and their temporal history.

      The latency required to rearrange the scrambled responses (60ms for LGN, 74ms for V1, 91ms for AM/PM) supports the anatomical hierarchy. The pattern of movie tuning properties was also broadly consistent between V1 and AM/PM (Figure 2).

      However, all metrics of movie selectivity (Figure 2) to the continuous movie showed a consistent pattern that was the exact opposite pattern of the simple anatomical hierarchy: V1 had stronger movie tuning, higher number of movie fields per cell, narrower movie-field widths, larger mega-scale structure, and better decoding than LGN. V1 was also more robust to the scrambled sequence than LGN. One possible explanation is that there are other sources of inputs to V1, beyond LGN, that contribute significantly to movie tuning. This is an important insight and we have modified the discussion (Ln 315-325) to highlight this.

      Relatedly, the hippocampal data do not quite fit with visual hierarchical ordering either, with CA3 being less sensitive to scrambling than DG. Since the data (especially in V1) seem to defy hierarchical visual processing, why not drop that interpretation? It is not particularly convincing as is.

      The anatomical organization is well established and an important factor. Even when observations do not fit the anatomical hierarchy, it provides important insights about the mechanisms. All properties of movie tuning (Figure 2) –the strength of tuning, number of movie peaks, their width and decoding accuracy firmly put visual areas upstream of hippocampal regions. But, just like visual cortex there are consistent patterns that do not support a simple feed-forward anatomical hierarchy. We have pointed out these patterns so that future work can build upon it.

      5) In the Discussion, the authors argue that the mice encode episodic content from the movie clip as a human or monkey would. This is supported by the (crucial) data from the scrambled movie condition, but is nevertheless difficult to prove empirically since the animals cannot give a behavioral report of recognition and, without some kind of reinforcement, why should a segment from a movie mean anything to a head-fixed, passively viewing mouse?

      We emphasize once again that our claim is about the nature of encoding of the movie across these neurons. We make no claims about whether this forms a memory or whether the mouse is able to recognize the content or remember it. Despite decades of research, similar claims are difficult to prove for place cells, with plenty of counter examples (See the points above). The important point here is that despite any cognitive component, we see remarkably tuned responses in these brain areas. Their role in cognition would take a lot more effort and is beyond the scope of the current work.

      Would the authors also argue that hippocampal cells would exhibit "song" fields if segments of a radio song-equally arbitrary for a mouse-were presented repeatedly? (reminiscent of the study by Aronov et al. 2017, but if sound were presented outside the context of a task). How can one distinguish between mere sequence coding vs. encoding of episodically meaningful content? One or a few sentences on this should be added in the Discussion.

      Aronov et al 2017, found the encoding of an audio sweep in hippocampus when the animals were doing a task (release the lever at a specific frequency to obtain a reward). However, without a task demand they found that hippocampal neurons did not encode the audio sequence beyond chance levels. This is at odds with our findings with the movie where we see strong tuning despite any task demand or reward. These results are consistent with but go far beyond our recent findings that hippocampal (CA1) neurons can encode the position and direction of motion of a revolving bar of light (Purandare et al. Nature 2022). Please see Ln 373-382 for related discussion.

      These responses are unlikely to be mere sequence responses since the scrambled sequence was also fixed sequence that was presented many times and it elicited reliable responses in visual areas, but not in hippocampus. Hence, we hypothesize that hippocampal areas encode temporally related information, i.e. episodic content. We have modified the discussion to address these points.

      Reviewer #1 (Recommendations For The Authors):

      1) Are LFP data available in the data set? If so, can SWRs identified and removed to refine the quantification of movie fields?

      Done, see Figure 1-figure supplement 6.

      2) Can movie fields be analyzed in re-arranged neural responses (Fig. 4e) and compared to those in other cases already shown (Fig. 4b, c)?

      Done, even after rearrangement the strength of movie tuning for the scrambled presentation was low, and below 5% in all hippocampal regions. See Figure 4-figure supplement 5 for details.

      3) It seems the authors are not fully committed to a main conclusion in the present manuscript. The title and abstract seem to emphasize the similar movie responses across visual and hippocampal areas, but the introduction and discussion emphasize the episode encoding of hippocampal neurons. The writing could be more consistent and the main message could be clearer.

      Selective responses to the continuous movie showed similar patterns (prevalence of tuning, multi-peaked nature, relation with frame to frame changes in visual images) between visual and hippocampal regions. But the visual responses to scrambled presentation could be rearranged, and the latency for rearrangement increased from LGN to V1 to AM-PM. On the other hand, selectivity to the scrambled presentation was virtually abolished in hippocampus, and responses could not be rearranged to resemble the continuous movie sequences. To reconcile these differences, we have hypothesized here that the hippocampal responses are episodic in nature, and rely on temporal continuity, whereas the visual regions rely directly on the visual content in the images.

      4) Line #158: "Net movie-field discharges was also comparable across brain areas...". This statement is not supported by Fig. 2g, which shows a wide range of median values across brain areas.

      Thank you for pointing this out. The normalized firing in movie-fields used in that figure are within 3x between V1 and subiculum. We have modified the text to contrast this with the 10x difference between movie-field durations.

      5) Line #253: What the two numbers (87.8%, 10.6%) mean is unclear (mean or median values). These numbers also appear inconsistent with the mean+-se values in Fig. 4 legend.

      The numbers mentioned on Ln253, in the main text reflect the median visual continuity index, combining across cells from hippocampal or visual regions. On the other hand, values reported in the Fig 4 legend are for V1 and subiculum, which are the regions with smallest and largest visual continuity index, respectively. We have re-written the main text, and legends for better clarity.

      6) The Gelbard-Sagiv et al paper (Science 322: 96-101, 2008) could be cited and its relevance to the present study could be discussed.

      Done

      7) Are there neurons recorded from a non-visual sensory or motor cortical area in the same experiment? This may provide a key negative control for the non-specific modulation caused by behavioral states or visual transients.

      Owing to the nature of the experiments where the Allen Institute intended to study visual processing, we could not find any of the recorded brain regions without movie selectivity.

      8) The differences in hippocampal and visual move fields between active and stationary time periods could be explicitly quantified.

      We have shown several raster plots where the responses are quite similar during immobile and moving epochs. Our goal is to show that there is indeed comparable movie tuning when the animals is immobile versus any random state. Doing specific analysis of behavioral dependency is difficult because in many sessions the amount of time the mice ran in many sessions was very little. A thorough analysis overcoming these, and other challenges is beyond the scope of this paper.

      Reviewer #2 (Recommendations For The Authors):

      1) The methods to determine the boundaries of the movie-fields should be clarified, and the detected peaks and boundaries should be indicated in the relevant figures (e.g., Fig. 2c, 2d, and 2h) to help readers clearly understand how the movie-fields were defined and how the shapes of the movie-fields look like.

      Done.

      2) When testing the influence of locomotion on movie-tuning in Extended Data Fig. 3, a single cell-based analysis is further needed. For example, you need to check whether the z-scored sparsity within one cell varies or not depending on locomotion conditions (as in Extended Data Fig. 10a-c). In addition, it is recommended to exclude the cells significantly modulated by locomotion (e.g., running velocity) before defining the movie-tuned cells.

      We now show example cells from sessions with or without prolonged running bouts in Figure 1-figure supplement 7 that have strong movie selectivity. We have also assessed the effects of theta power and pupil dilation on movie tuning in that figure. A more thorough analysis of the combined effects of locomotion and movie tuning is underway, but beyond the scope of the current work.

      3) Regarding the time-cell-related issue raised in the public review, it would be nice if the authors confirm whether the tiling patterns of hippocampal subregions have been weakened by presenting the population activities for the scrambled condition as in the visual cortices in Extended Data Fig. 11a.

      We have clarified in the earlier responses, please see above.

      4) In Fig. 4 and Extended Data Fig. 3, the proportion of movie-tuned cells in the hippocampus seems to drop significantly after only a portion of trials under specific conditions were extracted. Although the authors addressed the stability issue by comparing the neural responses between even and odd trials, the concern about whether the movie-tuning is driven by a certain portion of trials still remains. To avoid such misunderstanding, as mentioned in comment no.2, tracking the changes in the z-scored sparsity of one cell between continuous and scrambled conditions should be provided. In addition, according to the methods, the scrambled condition was divided into two blocks of 10 trials each, possibly causing premature movie-tuned activities. Thus, it should be more appropriate to compare with the first 10 trials of each block in the continuous condition.

      Done.

      5) Explanations related to statistical analysis should be added to the methods sections. In Fig. 2a (and related figures with similar analysis), when comparing three or more groups, the Kruskal-Wallis test should be performed first to check whether there is a difference between the groups, and then pairwise comparisons should follow with adjusted p-values for multiple comparisons. Also, in Fig. 4b (and related figures), it seems that the K-S test was performed to test the changes in cell proportion by combining all brain regions, as far as I understand. However, it would be more appropriate to test the proportional changes by a Chi-square test within each region since the total numbers of cells should differ across the regions.

      Yes, we have used the KS test throughout the analyses, unless otherwise mentioned or appropriate.

      6) The labeling for firing rate is 'FR (sp/sec)' in Fig. 1, 2, and 4, but it is 'Firing rate (Hz)' in Fig. 3.

      This has been fixed now, and only Firing rate (Hz), is used throughout. Thank you for pointing this out.

      7) There is a typo in Extended Data Fig. 11b. "... across all tuned responses from (b)." It should be (a) instead of (b).

      Done

      Reviewer #3 (Recommendations For The Authors):

      While the study presents an interesting dataset and conceptual approach, there are ways in which the manuscript should be strengthened.

      Minor concerns:

      1) Related to point (5) above, what content did the hippocampal "movie fields" encode? It would add a substantive dimension to the paper if the authors included examples of what segments of the movie the cells responded to. Are there "pan left" cells, or "man gets in the car" cells? Or was it more arbitrary than that? What is an example of a movie feature lasting 50ms that is stably encoded by a mouse hippocampal neuron?

      We show example cells with very sharply tuned neural responses (Figure 2h). A thorough analysis of the visual content is in progress but beyond the scope of this paper.

      2) Line 24-seems like it should read "Consistent presentation of the movie..." , with "ly" dropped from "consistent".

      Done

      3) Line 43-seems to be missing the article "a", and should read "...despite strong evidence for A hippocampal role in...".

      We rewrote this sentence for better clarity

      4) Line 54-to clarify, the higher visual areas recorded were the anteromedial (AM) and posterior-medial (PM) areas? The text additionally indicates a "medio-lateral" extrastriate area, but there is no such area. Can the text be revised to clear this up?

      Sorry about this confusion, indeed we meant posterior-medial (PM). Thank you for pointing this out.

      5) Line 84, "rate" should be pluralized to "rates".

      Done

      6) Line 108- the extra "But" at the start of the sentence should be removed.

      Done

      7) Figure 2h-was there any particular arrangement for the cells in this sub-panel? If not, could they be grouped by sub-region (or proximity between sub-regions) so it appears less arbitrary?

      Done

      8) Extended data 2 figure legend for (b) is missing a "that": "Fraction of selective neurons that was significantly above chance.... Ranging from 7.1% in CA

      Done

      9) Line 144-145, there is an extra "and" in the sentence: ".... were typically neither as narrow AND nor as prominent...."

      Done

      10) Line 203-the first word in the line should be "frames" (plural).

      Done, thank you for pointing this out

      11) Line 281-in "...scrambled sequence"-"sequence" should be plural. It looks like the same is true in line 882, in the legend title for Extended Data Fig. 11.

      Since we only showed one scrambled sequence (which was repeated 20 times), we rewrote the relevant lines to be “the scrambled sequence”

      12) Line 923-the first sentence of the legend for Extended Data Fig. 14-to what data or study are the authors referring to in saying that "More than 50% of hippocampal place cells shut down during maze exploration."? This was confusing, please clarify.

      This reference has now been added.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      1.1 Fig. 1: A good control for these studies would be a TDP-43 variant with an RRM1 mutation that impairs RNA binding, but not an acetylation mimic (i.e. mutations affecting W113, R151, F147, or F149)

      In our original paper (Cohen et al, Nat Commun. 2015 Jan 5; 6:5845), we already characterized TDP-43 acetylation and employed a complete RNA-deficient mutant (F147/149L), as the reviewer suggested. In that original study, this mutant showed maximal RNA binding-deficiency, and therefore we proposed that acetylation mimic mutations represent a comparable RNA-binding deficient variant.

      1.2 Fig. 1: time and expression level can influence nuclear TDP-43 puncta formation. It is important that the authors take these into account when measuring puncta number/frequency.

      All expression levels and transfection/transduction times were identical across samples. We chose the optimal times to express TDP-43 constructs yet minimize toxicity and found that neuronal transduction at DIV10 and arsenite exposure on DIV14 in mature neurons was optimal.

      1.3 Fig. 2: to accurately refer to the nuclear foci as anisosomes, the authors will need to conduct higher-resolution imaging.

      We agree with the reviewer and since anisosomes are not well characterized in terms of their relationship to TDP-43 nuclear foci (and may represent only a subset of foci), we have now omitted any mention of anisosomes but instead refer to them in the discussion, where we suggest that TDP-43 K145Q foci may partially represent anisosomes.

      1.4 Fig. 2D: it seems as though the splicing reporter should have a fluorescence-based readout (red/green ratio, for instance). Is this the case, and is the ratio informative?

      We have now removed the splicing reporter data and replaced this with much more robust data showing RT-qPCR of downstream TDP-43 targets including Sortilin-1 (see the new revised Figure 2E and 3B-I).

      1.5 Line 145: "Overall, these results indicate that a single endogenously expressed acetylation-mimic TDP-43(K145Q) mutation is sufficient to alter TDP-43 localization, induce TDP-43 phase separation, and impair splicing in a murine primary neuron culture model." The authors did not assess phase separation in this study. Moreover, it would be more convincing to assess native splice targets of TDP-43 in K145Q primary neurons, rather than an exogenous splicing reporter.

      See comment 1.1 above. We have now avoided mentioning phase separation in the main text but mention this as a potential mechanism in the discussion. In addition, we have now evaluated native TDP-43 splice targeted in primary neurons.

      1.6 Fig. 4A: is the loss of neurons selective for a specific layer or region of the cortex?

      Since we did not observe any gliosis, we have gone back and completely re-evaluated neuronal loss since the concept of neurodegeneration is a critical question in the TDP-43KQ/KQ mice. We do not find any significant neuronal loss in the homozygous TDP-43KQ/KQ mice (see Figure 5).

      1.7 Fig. 6: The authors suggest that the large majority of splicing changes are direct results of the TDP43(K145Q) mutation and impaired RNA binding by TDP-43. However, without a direct assessment of TDP43(K145Q) target RNAs in comparison to those of TDP-43(WT), this is only an assumption. Moreover, given the fact that RNA-seq was performed in aged animals, the potential for indirect gene expression changes is very high.

      In our original study (Cohen et al, Nat Commun. 2015 Jan 5;6:5845), we showed that the K145Q is severely deficient in RNA binding. In this study, we now show strong evidence that many known targets of direct TDP-43 binding are dysregulated, supporting the expected loss of function if TDP-43 K145Q mutation abrogated RNA binding. Although we have not performed direct RNA binding studies to the Sort1 transcript, for example, other studies have clearly indicated that wild-type TDP-43 binds these targets. We infer that loss of function mutations (i.e., K145Q) impact direct targets of TDP-43. Future studies employing RNA-immunoprecipitation followed by RNA sequencing (RIP-seq) could be useful in this regard and will be required to mechanistically address this point.

      1.8 Sup Fig. 8 is very interesting and suggests that any TDP-43 variant that is unable to bind RNA may lead to upregulation of TDP43 RNA and phenotypes similar to those observed n K145Q animals. This is alluded to in the discussion but never specifically tested.

      Yes, we agree with this reviewer’s comment. Loss of RNA binding, whether due to acetylation (e.g., K145Q) or otherwise is expected to cause autoregulatory up-regulation of the TARDBP transcript and impact other targets, potentially yielding phenotypes similar to the TDP-43KQ/KQ mice. However, new in vivo models would be needed to prove this point. For example, in the future, we will consider this possibility by characterizing recently identified RNA-binding deficient familial TARDBP mutants (e.g., P112H or K181E).

      1.9 The authors should also provide some comment or potential explanation for why TDP43(K145Q) animals show no signs of motor neuron disease.

      We now show a moderate level of TDP-43 aggregation and hyper-phosphorylation in spinal cord of mutant mice in Figure 6 – Figure Supplement 3. We also speculate in the discussion why we observe aspects of TDP-43 dysfunction in spinal cord without overt motor phenotypes up until 18 months old.

      1.10 Line 79: "However, TARDBP mutations that disrupt RNA binding, and thereby may act in a similar manner to TDP-43 acetylation, have been identified in FTLD-TDP patients." Evidence suggests that the D169G mutation does not interfere with RNA binding. See Furukawa et al., 2016.

      We thank the reviewer for pointing this out. We have now removed the D169G mutation from the discussion.

      1.11 It is unclear why the authors focused solely on homozygous K145Q animals, rather than heterozygous mice.

      We focused initially on homozygous mutant mice to provide better statistical power to detect small effect sizes. However, we have now included a thorough analysis of heterozygous mice including molecular analysis of brain tissue and mouse behavior, as shown in Figure 4 – Figure Supplements 1-2 and Figure 6 – Figure Supplements 1-3.

      Reviewer #2

      2.1 A strength of this paper is the generation of a new mouse model to study the effects of K145 acetylation in TDP-43 proteinopathy. While the authors note an absence of a behavioral phenotype on neuromuscular testing in aged animals, it would be appropriate to include some analysis of spinal cord and skeletal muscle in this initial description of their model. At a minimum, I wonder if there is pathology in the cord (neuron loss, gliosis) or muscle (fiber atrophy) if insoluble p-TDP-43 is detectable in these tissues, and whether dysregulated splicing of TDP-43 target genes (such as shown in Fig 7) occurs at these sites.

      See comment 1.9 above. We analyzed TDP-43 aggregation, localization, and splicing in the spinal cord of TDP-43KQ/KQ mice and found mild loss of TDP-43 function that was comparable, though not to same extent, as that seen in hippocampus and cortex. We discussed these findings in the discussion and provide several possibilities for why there are no overt motor phenotypes in these mice. We note that TDP-43 Q331K knock-in mice also have cognitive but no motor deficits, suggesting TDP-43 dysfunction may preferentially (or at least initially) impact cognitive function (White et al, Nat Neurosci. 2018 Apr;21(4):552-563).

      2.2 Fig 2: Differences in the splicing reporter are hard to appreciate from the images shown in panel E. Is the quantification shown in panel F corroborated by an analysis of green vs yellow fluorescence or by another method? Quantification of results shown in panel 2G (from 3 biological replicates) should be included.

      We have now removed the splicing reporter data in lieu of the more robust RT-qPCR data shown in Figure 2E and 3B-I. We have also now included more biological replicates from our iPSC neuron imaging, as shown in Figure 3A. Due to time and resource constraints, we were not able to quantify the images shown in figure 3A, and reinforce in the text that our statements are qualitative. However, we were able to add quantitative analysis of TDP-43 dysfunction, by detecting genotype-dependent splicing changes in hiPSC neurons, as mentioned above, which strengthens our claim that TDP-43 dysfunction is prominent in this culture modee.

      2.3 Fig 4: Differences in NeuN quantification without changes in cresyl violet staining or gliosis are surprising and a bit difficult to understand. Is there confirmation of neuron loss through another metric? Is it possible that NeuN expression is lower in mutants without frank neuron loss? Also, although no significant differences were seen by IF for TDP-43 staining, did IF for phospho TDP-43 show differences? One might expect this to be the case given the biochemical findings in Fig 5.

      See comment 1.6 above. After a much more in-depth and rigorous assessment, we find little evidence for neurodegeneration. Given the transcriptome data showing that TDP-43 regulates a subset of synaptic genes, we suggest that synaptic deficits underlie the behavioral phenotype rather than neuronal loss.

      Regarding phospho-TDP-43 pathology by immunofluorescence (IF) staining, after much effort, we have not been able to detect phospho-TDP-43 pathology by IF in TDP-43KQ/KQ mice. Currently available phospho-TDP-43 antibodies (including those acquired from collaborators) do not work well to detect endogenous mouse TDP-43 by histology or IF staining, and therefore we are somewhat limited technically. Nonetheless, given the increase in phospho-TDP-43 in the insoluble fractions by western blotting combined with the increase in cytoplasmic TDP-43 via biochemical fractionation, our data suggest that phospho-TDP-43 is the relevant species accumulating in the cytoplasm of TDP-43KQ/KQ mice.

      2.4 Fig 5: Probing the NC fractions for phospho TDP-43 would be an interesting addition to support the conclusion that increased cytoplasmic localization of the KQ mutant occurs prior to its phosphorylation.

      We agree that this would be an excellent addition to our data. Unfortunately, after rigorous antibody validation experiments, we were not able to find a phospho-TDP-43 antibody that specifically detected phosphorylated TDP-43 and did not cross-react with unphosphorylated TDP-43 in the buffers used for N-C fractionations. We tested phospho-TDP antibodies in RIPA (soluble), Urea (detergent-insoluble), and the N-C fractionation buffers, using samples treated or untreated with lambda phosphatase (to de-phosphorylate TDP-43). Only one antibody reliably detected the phosphorylated TDP-43 and not the lambda phosphatase-treated TDP-43 samples, and only did so in the Urea buffer, which is shown by straight westerns in our manuscript. Because of these technical difficulties with the phospho-TDP-43 antibodies, this was a challenging point to address at the moment. As better phospho-TDP antibodies become available, we hope to be able to address this. We therefore cannot definitively conclude that cytoplasmic phospho-TDP-43 pathology is present in these mice, but nonetheless the total phospho-TDP-43 levels are significantly elevated in urea (insoluble) fractions.

      2.5 Fig 1: What quantitative criteria were used to distinguish between puncta and foci, as highlighted in panel A? What is the biological significance of this distinction? From the images in panel A, it is difficult to see the TDP-43 foci in wt and K145R expressing cells.

      Although the size of nuclear TDP-43 foci can be quite variable, and we are certainly interested in the biological significance of this parameter, we did not focus this study on size profiles of K145Q-induced foci, only their accelerated formation and abundance. Therefore, in the revised manuscript we chose not to explicitly state any differences in “foci” vs. “puncta” and now refer to all nuclear TDP-43 structures as “foci” (removed the word “puncta” throughout).

      2.6 Fig 3: In describing the results of context-dependent fear testing, it is more appropriate to state that significant deficits appeared at 18 months, deleting the word "more" on line 186.

      We have deleted the work “more”.

      Reviewer #3

      3.1 Multiple figures (1b, 1c, 2b, 2c, 4b, 4d, 4f, 4g, 4i, 4j) include data with multiple measurements per field of view and multiple fields of view per condition. It appears that each measurement was considered an "n" for ANOVA or t-tests, but the data structure violates the requirement that data points are independent. More rigorous statistical methods such as mixed effect models should be considered (see DOI: 10.1016/j.neuron.2021.10.030) which in many cases provide more statistical power. Mixed effects models are the more appropriate statistical method for much of their data. Should the authors want to reanalyze their data with this method, they can reach out to me for an introduction to this statistical model.

      We have now re-evaluated the figures mentioned using linear mixed effects models, similar to what the reviewer has mentioned. The new statistical measurements have been incorporated into the revised Figures 1, 2, and 5 (formerly Figure 4). A description of the statistical methods used is now provided in the revised methods section.

      3.2 In the introduction, the authors write "we avoid both TDP-43 overexpression and disruption of autoregulatory genomic elements of the endogenous Tardbp transcript" but they show that autoregulation is altered. So shouldn't the acetylation sites be considered a genomic element that regulates autoregulation?

      We agree and have now stated that our knock-in approach avoids disrupting surrounding genomic elements (as could occur with transgenic or gene replacement strategies, for example) in order to retain the native Tardbp gene in its unaltered form.

      3.3 Suggest editing the language regarding potential neurodegeneration/neuron loss as the same results could be obtained with tissue volume and/or developmental effects independent of progressive neurodegeneration.

      See comments 1.6 and 2.3 above. The language has been edited to reflect no apparent neurodegeneration.

      3.4 Sequencing the top predicted off-target loci in CRISPR'd mice and iPSC cell lines would help show the absence of off-target mutations.

      We described in the methods how potential off-target effects were avoided. We assessed the likelihood of off-target mutations using prediction algorithms to ensure low likelihood. All of the predicted exonic off-target sites have 4 mismatches, making them extremely unlikely to be mutated.

      3.5 The authors describe a subtle shift in electrophoretic mobility of the SORT1 protein band in figure 7d/e, but it is unclear why the entire SORT1 band should be shifted up in mutant mice given that the RNA analysis suggests that WT species (not the cryptically spliced +ex17b) is still the major RNA that is expressed. In addition, others have shown that the WT versus +ex17b bands can be resolved (see DOI: 10.1073/pnas.1211577110). Perhaps knockout/knockdown cells can facilitate by providing a positive control for sizing/separation of Sort1 by immunoblotting.

      Please refer to our RNA-seq data shown in Figure 8A. In WT mice, nearly 80% of Sort1 transcripts lack exon17b, while this number drops to 23% in the TDP-43KQ/KQ mice. Therefore, the abnormally spliced +ex17b becomes the dominant transcript in TDP-43KQ/KQ mice. Given the prominent +ex17b inclusion that we are observing at the transcript level, it is not surprising that we mostly observe the up-shifted ex17b-containing Sort1 protein band. We have been unable to resolve two distinct bands by immunoblotting in mouse tissues using multiple Sort1 antibodies, including those used in Prudencio et al Proc Natl Acad Sci U S A. 2012 Dec 26;109(52):21510-5. Nonetheless, the up-shifted Sort1 protein is clearly the abnormal variant, as it becomes destabilized in our mice. Another possibility is that partial loss of TDP-43 function, as we suspect occurs in the TDP-43KQ/KQ mice, may magnify (or enhance) the effects on Sort1 such that the dominant Sort1 variant observed is the +ex17b containing variant. We suspect this to be true since this phenomenon was also observed in the Prudencio et al study (see Figures 1-2 in that study).

      3.6 The authors may try to corroborate their CFTR splicing results by examining fluorescence as it appears that the construct allows for analysis of splicing differences using GFP vs mCherry expression. This is a minor point as RNA-seq analysis demonstrates abundant splicing changes in acetylation-mimic expression models.

      We have now removed the CFTR splicing data entirely and replaced it with more robust readouts of endogenous TDP-43 splicing targets both in vitro (Figure 2E, 3B-I) and in vivo (Figure 8B-C).

      3.7 Should the bars in figure 3d for 1 and 2 min be colored in grey/pink? It is unclear why they are clear and only outlined in color.

      This point is clarified in the revised Figure 4D legend. In our cue-dependent conditioned fear testing, the filled bars beyond 2 min represents the presence of the auditory cue (tone) and the period of statistical analysis.

      3.8 The statistical test used (Fisher's exact test?) for determining overlap between transcriptome datasets should be stated.

      We clarified our comment in the results section to reflect the use of over-enrichment analysis. In the methods section, it reads “Previously published differentially expressed genes from Hasan et al95 and Polymenidou et al96 were retrieved from the respective publications; significant over-enrichments as well as human gene symbol mappings to mouse orthologs were performed using gprofiler2 (g:Orth).”

    1. Author Response

      We thank the reviewers for their work, their careful reading of our manuscript, their appreciative evaluation and their comments and suggestions, which we will consider to ameliorate the paper. 

      For now, we anticipate two short considerations.  

      We agree that the PCR step in the ADSE evolutive process might introduce a bias in the population and that such effect should be better examined. We have in fact started performing new experiments, among which ADSE evolution cycles without resources. From the elements we currently have, we see the PCR bias effect as minor, not making a significant difference in the emergence and interaction of species we have reported. 

      ADSE protocol is markedly simpler than any other evolution protocol based on even the most basic cellular processes. However, many are the experimental parameters which can be changed in ADSE: initial DNAi population (level of randomness vs. combination of designed sequences), resource structure (resource sequence and length, bead-resource linker length and type), capture condition (length and concentration of DNAi, pH, temperature, bead density), amplification step (choice of polymerase and rate of mutation, length of primers, thermal protocol). The availability of these parameters is a strength of ASDE, making possible exploring a large variety of evolution condition and to introduce kinetic drifts (e.g. in the resources). At the same time, the variety of parameters prompted us to make choices as discussed in the article and to stick to them in all our experiments. The exploration of the many variants that can be considered, some of them very interesting, and some of which proposed by the reviewers, would require an important experimental work that we are planning to conduct for a few among these possibilities, to be part of future publications.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for the helpful comments regarding our manuscript, "Association between APOL1 risk variants and the occurrence of sepsis among patients hospitalized with infections.” We have revised the title of the manuscript in response to reviewer comments. Additionally, we have updated the manuscript with analyses among patients with pre-existing renal disease alone as well as other items suggested by the reviewers. The Tables have been renumbered to accommodate these revisions.

      Public review:

      The study has main limitations which need to be addressed and there is lack of functional explanation of carriage. These limitations are: a) the lack of inclusion of non-Black patients; and b) the lack of appropriate explanation if results are false-positive since APOL1 provides risk for chronic renal disease (CRD) and patients with CRD are predisposed to sepsis. Sepsis occurred in 565 Black subjects, of whom 105 (29% ) had APOL1 high-risk genotype and 460 had-low risk genotype. Importantly, the risk for sepsis associated with APOL1 HR variants was no longer significant after adjusting for subjects pre-existing severe renal disease or after excluding these subjects. Thus, the susceptibility pathway seems to be: APOL1 variants > CKD > sepsis diathesis.

      Suggestions to the authors:

      • The authors need to provide analysis of patients of non-Black origin.

      We apologize for not fully clarifying that the APOL1 high-risk genotypes are virtually exclusive to populations of recent African ancestries,1–4 the majority of whom are identified as having Black race in our dataset.5 To illustrate the rarity of APOL1 high-risk genotypes in other reported races, we examined the frequency of these genotypes in White patients who had been hospitalized with infections at VUMC (comparable to the cohort of Black patients used in the study). Compared to the 361 out of 2242 (16.1%) Black patients hospitalized with infections carrying APOL1 high-risk genotypes, there were only 8 carriers of APOL1 high-risk genotypes out of 12,990 White patients (0.06%); of these 8, 2 patients developed sepsis during hospitalization. Due to a low number of carriers (n=8) and limited number of events (n=2), we could not proceed with further analysis. Patients reported as other races (e.g., Asian and American Indian) are less frequent than White or Black patients in the VUMC de-identified EHR; as such, we would anticipate similarly small, if any, numbers of high-risk genotypes among these groups, with insufficient power for meaningful analysis. Comparisons between racial groups that did not have carriage of the APOL1 high-risk genotypes would increase the possibility of confounding by factors associated with racial identity (e.g., social determinants of health), rather than genotype; as such, detected differences would likely reflect those factors, rather than the impact of APOL1.

      We have now added clarifying language in the Methods section.

      • The Table of demographics needs to include the type of infections and the underlying pathogen.

      Microbiological evidence of specific infection types is not available for the majority of records for patients hospitalized with infections (as well as sepsis); indeed, for many patients with common infections (e.g., pneumonia) the pathogen is often not identified.6 While we do not have details regarding the underlying pathogens, we were able to determine infection categories at admission. We now include details regarding the categories of infection based on ICD codes in Supplementary Table 1, and the updated Table 1 now includes that information for the APOL1 high-risk and low-risk groups. Given that individuals could have more than one type of infection, we also tested the number of types of infection and found no significant difference between the high-risk and low-risk genotypes (p=0.77).

      • The authors need to provide convincing analysis if results are false-positive since APOL1 provides risk for chronic renal disease (CRD) and patients with CRD are predisposed to sepsis. For this purpose, they have to provide evidence if the sepsis causes (both type of infection and implicated pathogens) in patients with CRD who are carriers of APOL1 variants are different than in patients with CRD who are not carriers of APOL1 variants.

      Indeed, we believe the presented findings suggest that the apparent association between APOL1 high-risk genotypes and sepsis is driven by associated pre-existing severe renal disease rather than APOL1 itself; we appreciate the suggestion to conduct additional analyses to assess whether APOL1 high-risk genotypes impact the occurrence of sepsis among those patients with pre-existing severe renal disease. We note that this analysis could also be biased towards detecting a spurious association between APOL1 high-risk genotypes and sepsis if, within the subgroup with pre-existing severe renal disease, patients with high-risk genotypes also have more severe pre-existing renal disease.

      Among the patients with pre-existing severe renal disease (n=458), 121 (26.4%) were carriers of the APOL1 high-risk genotypes. First, we assessed the severity of renal disease among these patients, detecting an association between APOL1 high-risk genotypes and greater severity (i.e., CKD stage 5/ESRD) when adjusted for age, sex, and 3 PCs: OR=2.29 (95% CI, 1.42-3.67, p=6.25x10-4). Then, we compared the primary outcome of sepsis in patients with APOL1 high-risk and low-risk genotypes for this subgroup. Despite the potential bias toward detecting an association between sepsis and the high-risk genotype based on the severity of pre-existing renal disease, there was no significant association between the high-risk genotypes and sepsis (OR=1.29, [95% CI, 0.84-1.98, p=0.25]). Finally, we assessed infection categories (as described in the above response) in this subgroup. We found no significant differences between the high-risk and low-risk genotypes in the frequency of any infection category.

      These results suggest that the APOL1 high-risk genotypes are not associated with an increased risk of sepsis among patients who have pre-existing severe renal disease. Taken with our other findings, the high-risk genotypes appear to have little or no association with sepsis beyond their association with renal disease. As such, drugs targeting those genotypes would likely have little effect in the acute setting of hospitalization with infection; rather, their primary contribution to the prevention of sepsis would need to target the prevention of underlying renal disease. We have revised our Methods, Results, and Discussion to include these findings.

      • Why concentrations of APOL1 were not measured in the plasma of patients?

      Although APOL1 high risk genetic variants have been repeatedly associated with renal-related clinical phenotypes, and many candidate mechanisms have been proposed,4 there has been contradictory evidence regarding whether the genetic variants could be linked to altered plasma APOL1 levels or whether APOL1 levels are related to elevated risk of renal disease. This is not surprising since it is the altered biological function of the APOL1 structural variant that is important, rather than the concentration of APOL1 protein. While some studies have detected an association between APOL1 high-risk genotypes and plasma levels among patients with renal disfunction and sepsis,7 other population studies have suggested no association between APOL1 plasma levels and renal function.8 Plasma APOL1 levels are seldom measured in clinical practice and thus were not available in this retrospective cohort. However, given the inconsistency of findings and the underlying biology of APOL1, we believe measurements of levels (rather than function) is unlikely to be illuminating.

      • Why analysis towards risk for death is not done?

      In the current study, we focused on the risk of in-hospital death. We did not include the risk of out-of-hospital death due to potential data fragmentation. Specifically, we only have access to the patient’s EHRs at VUMC, and death after hospital discharge is not always be included in a patient’s EHR unless relatives contact the hospital. As such, we focused on in-hospital death, which we validated previously with manual chart review.9 Paralleling the design from a previous publication assessing sepsis outcomes, we included discharge to hospice as part of our in-hospital death algorithm,10 as patients with a terminal illnesses are often discharged to hospice. However, to clarify this outcome component, we now refer to in-hospital deaths and discharge to hospice collectively as “short-term mortality.” In this study, of the 84 total patients with the “short-term mortality” outcome, 47 patients were in-hospital deaths and 37 patients were discharged to hospice. Parallel to the short-term mortality, we found no association with in-hospital death alone. Ln 190: discharge to hospice. I am not sure this can be translated in in-hospital mortality. As noted in the above response, we have rephrased this outcome component as “short-term mortality,” following the design of a previous publication assessing sepsis outcomes.10

      • The authors need to explain why functional information is not provided.

      Functional studies were not performed for several reasons. Animal models are problematic because mice do not have an ortholog to the human APOL1 gene, and the various models developed all have limitations, particularly when second and third perturbations (sepsis and renal impairment) would need to be introduced.11 Also, since we did not observe an association between the genotypes and sepsis independent of pre-existing severe renal disease, we did not pursue additional functional studies. We do describe existing functional analysis in the introduction and briefly in our discussion; we now note this limitation.

      • n 162-172: too many assumptions have been used for the trial; thus, progression to sepsis is difficult to define. According to Sepsis-3 sepsis is no more a continuum from infection to sepsis and septic shock. Some patients presented with sepsis (-1, 0, 1 days considered by the authors) and when electronic health records are used, we are not able to detect the exact timepoint of SOFA score turning to a 2-point increase. This is a major limitation of the methodology presented.

      Same applies for all comorbidities and data extracted from electronic health records.

      Thank you for highlighting this issue. We acknowledge that our choice of wording was unclear. The choice of ICD infection codes during the initial hospitalization window (i.e., -1, 0, 1 days) was aimed to generate a clean cohort of patients hospitalized with infections (i.e., not secondary infections or development of sepsis after an in-hospital procedure), rather than to establish a timeline of progression from infection to sepsis. As you correctly note, our algorithm would capture patients presenting with infection and concurrent sepsis at admission rather than progression to sepsis, and the exact timepoint of the SOFA score meeting the 2-point criterion is difficult to capture through the EHR. Accordingly, we conducted no time-dependent analysis in the current study. To more accurately convey the methodology of the current study (i.e., testing the association between APOL1 high-risk genotypes—which the patients were born with—and the risk of sepsis for patients hospitalized with infections), we revised the manuscript thoroughly, replacing “progression to sepsis” with “occurrence of sepsis” in the title, abstract as well as on pages 7, 8, and 19. We also acknowledge the limitations of using EHR in the Discussion.

      • P value significance thresholds were set at 0.05, except for the PWAS where the threshold was set at 0.05/5 (p13). It would be helpful to list at this point what the 5 outcomes were that led to this adjusted threshold.

      We have revised the manuscript accordingly.

      "Risk of sepsis was significantly increased among patients with high-risk genotypes (OR 1.29, 1.0 to 1.67, P1.29, CI 1.00-1.67, P<0.47)." Some would argue that a confidence interval that includes 1.0 indicates non-significance.

      While the lower bound of the confidence interval appears to meet the 1.0 threshold with only 2 decimal places (which would preclude significance), when taken to the 4th decimal place, the value is 1.0037, demonstrating that the 95% CI did not meet or cross under the 1.0 threshold, and thus the odds ratio should be considered significant (as evidenced by the p=0.047). This result is consistent with other studies that have detected an association between the high-risk genotypes and sepsis,7 but you correctly note that readers can discern from the confidence intervals that the finding is not strong.

      • The Discussion is too long and should be shortened.

      We have revised the Discussion. 

      References:

      1. Limou S, Nelson GW, Kopp JB, Winkler CA. APOL1 Kidney Risk Alleles: Population Genetics and Disease Associations. Adv Chronic Kidney Dis. 2014;21(5):426-433. doi:10.1053/j.ackd.2014.06.005

      2. Kopp JB, Nelson GW, Sampath K, et al. APOL1 genetic variants in focal segmental glomerulosclerosis and HIV-associated nephropathy. J Am Soc Nephrol. 2011;22(11):2129-2137. doi:10.1681/ASN.2011040388

      3. Zhang J, Fedick A, Wasserman S, et al. Analytical Validation of a Personalized Medicine APOL1 Genotyping Assay for Nondiabetic Chronic Kidney Disease Risk Assessment. The Journal of Molecular Diagnostics. 2016;18(2):260-266. doi:10.1016/j.jmoldx.2015.11.003

      4. Daneshpajouhnejad P, Kopp JB, Winkler CA, Rosenberg AZ. The evolving story of apolipoprotein L1 nephropathy: the end of the beginning. Nat Rev Nephrol. 2022;18(5):307-320. doi:10.1038/s41581-022-00538-3

      5. Dumitrescu L, Ritchie MD, Brown-Gentry K, et al. Assessing the accuracy of observer-reported ancestry in a biorepository linked to electronic medical records. Genet Med. 2010;12(10):648-650. doi:10.1097/GIM.0b013e3181efe2df

      6. Wiese AD, Griffin MR, Stein CM, et al. Validation of discharge diagnosis codes to identify serious infections among middle age and older adults. BMJ Open. 2018;8(6):e020857. doi:10.1136/bmjopen-2017-020857

      7. Wu J, Ma Z, Raman A, et al. APOL1 risk variants in individuals of African genetic ancestry drive endothelial cell defects that exacerbate sepsis. Immunity. 2021;54(11):2632-2649.e6. doi:10.1016/j.immuni.2021.10.004

      8. Kozlitina J, Zhou H, Brown PN, et al. Plasma Levels of Risk-Variant APOL1 Do Not Associate with Renal Disease in a Population-Based Cohort. J Am Soc Nephrol. 2016;27(10):3204-3219. doi:10.1681/ASN.2015101121

      9. Liu G, Jiang L, Kerchberger VE, et al. The relationship between high density lipoprotein cholesterol and sepsis: A clinical and genetic approach. Clin Transl Sci. 2023;16(3):489-501. doi:10.1111/cts.13462

      10. Alrawashdeh M, Klompas M, Simpson SQ, et al. Prevalence and Outcomes of Previously Healthy Adults Among Patients Hospitalized With Community-Onset Sepsis. Chest. 2022;162(1):101-110. doi:10.1016/j.chest.2022.01.016

      11. Yoshida T, Latt KZ, Heymann J, Kopp JB. Lessons From APOL1 Animal Models. Front Med (Lausanne). 2021;8:762901. doi:10.3389/fmed.2021.762901

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In this study, Fang H et al. describe a potential pathway, ITGB4-TNFAIP2-IQGAP1-Rac1, that may involve in the drug resistance in triple negative breast cancer (TNBC). Mechanistically, it was demonstrated that TNFAIP2 bind with IQGAP1 and ITGB4 activating Rac1 and the following drug resistance. The present study focused on breast cancer cell lines with supporting data from mouse model and patient breast cancer tissues. The study is interesting. The experiments were well controlled and carefully carried out. The conclusion is supported by strong evidence provided in the manuscript. The authors may want to discuss the link between ITGB4 and Rac1, between IQGAP1 and Rac1, and between TNFAIP2 and Rac1 as compared with the current results obtained. This is important considering some recent publications in this area (Cancer Sci 2021, J Biol Chem 2008, Cancer Res 2023). In addition, some key points need to be addressed in order to support their conclusion in full.

      Thanks for your positive comments.

      1) It is rarely found studies using the term of "DNA damage drug resistance". Do the authors mean "DNA damage and drug resistance" or "DNA damage-related drug resistance" or "DNA damage-induced drug resistance"? It is better to define "DNA damage drug resistance" in the manuscript if it is not a common term in the field.

      We agree with you that the description "DNA damage-related drug resistance" is better so that we revised it uniformly in the manuscript.

      2) For Figure 4A, it is stated the IQGAP1 is identified via IP-MS. However, the MS results are not presented in the Figure or in the supplementary. In Figure 4A, only the IP results with silver staining was presented. Moreover, based on the silver staining here, a bunch of proteins were increased in TNFAIP2 overexpression group compared to the vector group. Especially, there is a much clearer band at 52kDa. The authors didn't explain why they chose IQGAP1 and ITGB4 which are less clear than the protein(s) at 52kDa.

      Supplementary table 1 is our mass spectrometry results. There are two reasons for choosing ITGB4 and IQGAP1. Firstly, we selected the proteins that indeed interact with TNFAIP2 according to our verification experiments. Secondly, we were interested in the mechanism by which TNFAIP2 promoting DNA damage-related drug resistance, and we found that ITGB4 promoted drug resistance, while IQGAP1 activated Rac1.

      3) According to the images in Figure 4C, the efficiency of si-IQGAP1 is limited. The authors could analyze the WB image to confirm the inhibition efficiency of si-IQGAP1.

      We analyzed the WB images and the quantitative results are as follows in Author response image 1. The knockdown efficiency is acceptable.

      Author response image 1.

      4) In Figure 5B, I wonder whether the authors can explain why the IgG could immunoprecipitate similar amount of ITGB4 protein as input group.

      In this experiment, the Input group had relatively less loading amount (5%), while the IgG group had nonspecific binding.

      5) According to the results from Figure 6B, the inhibition efficiency of shITGB4#1 is much higher than shITGB4#2. However, the effects of shITGB4#1 on GTP-Rac1 are similar to or even weaker than those of shITGB4#2 in both HCC1806 and HCC1937. Can this be explained?

      The possible reason is that downregulation of ITGB4 expression to a certain level is sufficient to inhibit the activation of Rac1.

      6) In Figure 6F, there are double bands for ITGB4 while only one band shows in other Figures. Please find a better representative image here.

      ITGB4 has a cleaved band in addition to the main band. These two bands could be separated when we used a low concentration SDS-PAGE gel.

      7) In the manuscript, GAPDH, b-Actin and Tubulin are used in different experiments as internal controls. Is there any specific reason to using different internal controls for different experiments here?

      There is no specific reason using different internal controls. These experiments were conducted by different person. Each individual chose different internal controls based on the protein sizes.

      8) I cannot find Table 1 for the correlation results for TNFAIP2 and ITGB4. I wonder whether Figure 8E is the Table 1 as is mentioned, since it is stated in line 561 that Figure 8E is "the work model of this paper" but actually Figure 8F is. If Figure 8E is the correlation results, I highly recommended the scatter plots graph is used here to present more clear and visualized correlation between TNFAIP2 and ITGB4.

      Figure 8E is indeed the correlation result. In addition, Figure 8E could not be presented as scatter plot graph because the pattern of TNFAIP2 and ITGB4 expression is negative or positive according to the determination of IHC results which was carried out by professional pathologists.

      9) Throughout the whole manuscript, no description of N number was found in figure legends or in Methods for in vitro experiments. N number is important for statistical analysis.

      All our experiments have set up three replicates. We provide this information in figure legends.

      Reviewer #2:

      Breast cancer is the most common malignant tumor in women. One of subtypes in breast cancer is so called triple-negative breast cancer (TNBC), which represents the most difficult subtype to treat and cure in the clinic. Chemotherapy drugs including epirubicin and cisplatin are widely used for TNBC treatment. However, drug resistance remains as a challenge in the clinic. The authors uncovered a molecular pathway involved in chemotherapy drug resistance, and molecular players in this pathway represent as potential drug targets to overcome drug resistance. The experiments are well designed and the conclusions drawn mostly were supported by the data. The findings have potential to be translated into the clinic.

      Thanks for your positive comments.

      1) In Introduction, the statement of "Breast cancer is the most common malignant tumor in women, and the morbidity and mortality rates of female malignant tumors are ranked first in the world" is inaccurate.

      We have revised the description as“Breast cancer is the most commonly diagnosed cancer and the leading cause of cancer death in women”.

      2) In Materials and Methods, "Immunopurification and silver staining" is not correct, which should be replaced with "Immunoprecipitation and silver staining".

      We replaced the description in the manuscript according to your suggestion.

      3) It is unclear Why the authors chose the two TNBC cell lines, HCC1806 and HCC1937, for cell models in this work.

      We chose these two cell lines according to our previous work“KLF5 promotes breast cancer proliferation, migration and invasion in part by upregulating the transcription of TNFAIP2” (doi: 10.1038/onc.2015.263. Epub 2015 Jul 20).

      4) To demonstrate TNFAIP2 and ITGB4 confer TNBC drug resistance in vivo, the knockdown efficiency of animal experiments was not shown.

      The knockdown efficiency of animal experiments was shown below. We added this result into Figure 2-figure supplement 2G and Figure 5-figure supplement 2N.

      5) I would strongly suggest the authors seek help from a language editing service to improve the manuscript.

      We improved the manuscript by using a professional English language editing service and we have carefully revised the manuscript.

      Reviewer #3:

      In this manuscript, Fang and colleagues found that IQGAP1 interacts with TNFAIP2, which activates Rac1 to promote drug resistance in TNBC. Furthermore, they found that ITGB4 could interact with TNFAIP2 to promote TNBC drug resistance via the TNFAIP2/IQGAP1/Rac1 axis by promoting DNA damage repair.

      This work has good innovation and high potential clinical significance. However, there are several unsolved concerns that have to be addressed.

      Thanks for your positive comments.

      1) In the manuscript, there are four drugs used for in vitro cell experiments, why is olaparib (AZD) not used for in vivo animal experiments?

      There are two reasons why we did not choose AZD. First,the killing effect of AZD is not as strong as that of BMN. Second, AZD is more expensive than BMN. We finally chose BMN for animal experiments.

      2) In Figure 4B, why the immunoprecipitation experiments is done in HCC1806 cell line?

      In our previous study “KLF5 promotes breast cancer proliferation, migration and invasion in part by upregulating the transcription of TNFAIP2” (doi: 10.1038/onc.2015.263. Epub 2015 Jul 20), we found that TNFAIP2 knockdown could obviously inhibit the activation of Rac1 in HCC1806 when compared to the result in HCC1937. So, we used HCC1806 cell line to perform the IP-Mass assay.

      3) There should be data showing the knockdown effect of TNFAIP2 and ITGB4 in animal experiments.

      We addressed the same question above (Reviewer #2, Question#4).

      4) When screening the interaction regions between ITGB4 and TNFAIP2, why the TNFAIP2 protein truncation strategy is to delete the N-terminus?

      In fact, we also deleted the C-terminus, but the deletion of C-terminus of TNFAIP2 did not affect the interaction.

      5) In the manuscript, "input" should be changed to "Input".

      We corrected it.

      6) There should be a space between "Figure" and numbers.

      We add a space between "Figure" and numbers.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The authors of this manuscript characterize new anion conducting that is more red-shifted in its spectrum than prior variants called MsACR1. An additional mutant variant of MsACR1 that is renamed raACR has a 20 nm red-shifted spectral response with faster kinetics. Due to the spectral shift of these variants, the authors proposed that it is possible to inhibit the expression of MsACR1 and raACR with lights at 635 nm in vivo and in vitro. The authors were able to demonstrate some inhibition in vitro and in vivo with 635 nm light. Overall the new variants with unique properties should be able to suppress neuronal activities with red-shifted light stimulation.

      Strengths:

      The authors were able to identify a new class of anion conducting channelrhodopsin and have variants that respond strongly to lights with wavelength >550 nm. The authors were able to demonstrate this variant, MsACR1, can alter behavior in vivo with 635 nm light. The second major strength of the study is the development of a red-shifted mutant of MsACR1 that has faster kinetics and 20 nm red-shifted from a single mutation.

      Weaknesses:

      The red-shifted raACR appears to work much less efficiently than MsACR1 even with 635 nm light illumination both in vivo (Figure 4) and in vitro (Figure 3E) despite the 20 nm red-shift. This is inconsistent with the benefits and effects of red-shifting the spectrum in raACR. This usually would suggest raACR either has a lower conductance than MsACR1 or that the membrane/overall expression of raACR is much weaker than MsACR1. Neither of these is measured in the current manuscript.

      Thank you for addressing this crucial issue. We posit that the diminished efficiency of raACR in comparison to MsACR1 WT can be attributed to the tenfold acceleration of its photocycle. As noted by Reviewer 1, the anticipated advantages associated with a red-shifted opsin, particularly in in vivo preparations, are offset by its accelerated off-kinetics. Consequently, the shorter dwell time of the open state leads to a reduced number of conducted ions per photon. Nevertheless, the operational light sensitivity is not drastically altered compared to MsACR WT (Fig. 3C). We believe that the rapid kinetics offer interesting applications, such as the precise inhibition of single action potentials through holography.

      There are limited comparisons to existing variants of ACRs under the same conditions in the manuscript overall. There should be more parallel comparison with gtACR1, ZipACR, and RubyACR in identical conditions in cultured cell lines, cultured neurons, and in vivo. This should be in terms of overall performance, efficiency, and expression in identical conditions. Without this information, it is unclear whether the effects at 635 nm are due to the expression level which can compensate for the spectral shift.

      We compared MsACR1 and raACR with GtACR1 in ND cells in supplemental figure 4. We concur that further comparisons could be useful to emphasise both the strengths of MsACRs and applications where they may not be as suitable. We are currently in the process of outlining a separate article. We firmly believe that each ACR variant occupies a distinct application niche, which necessitates a more comprehensive electrophysiological comparison to provide valuable insights to the scientific community.

      There should be more raw traces from the recordings of the different variants in response to short pulse stimulation and long pulse stimulation to different wavelengths. It is difficult to judge what the response would be like when these types of information are missing.

      We appreciate Reviewer 1's feedback and have compiled a collection of raw photoresponses, encompassing various pulse widths and wavelengths, which can be found in the Supplementary materials (Supplementary Figures 4 and 5).

      Despite being able to activate the channelrhodopsin with 635 nm light, the main utility of the variant should be transcranial stimulation which was not demonstrated here.

      We concur with Reviewer 1's assessment that MsACR prime application is indeed transcranial stimulation. However, it's worth emphasising that the full advantages of transcranial optical stimulation become most apparent when animals are truly freely moving without any tethered patch cords. Our ongoing research in the laboratory is dedicated to the development of a wireless LED system that can be securely affixed to the animal's skull. We aim to demonstrate the potential of these novell optogenetic approaches in the field of behavioural neuroscience in the coming year.

      Figure 3B is not clearly annotated and is difficult to match the explanation in the figure legend to the figure. The action potential spikings of neurons expressing raACR in this panel are inhibited as strongly as MsACR1.

      We have enhanced the figure caption and annotations for clarity. The traces presented in Figure 3B are intended to demonstrate the overall effectiveness of each variant. However, it is in the population data analysis, as depicted in Figure 3E, where the meaningful insights are revealed.

      For many characterizations, the number of 'n's are quite low (3-7).

      We acknowledge Reviewer 1's suggestion regarding the in vivo data and agree with the importance of including more animals, as well as control animals. However, we are committed to adhering to the principles of the 3Rs (Replacement, Reduction, Refinement) in animal research, and given the robustness of our observed effects, we will add animals to reach the minimal number of animals per condition (n = 2) to minimise unnecessary animal usage while ensuring statistical power. We will continue to adhere to the established standards in the field, aiming for a range of 3 to 7 cells per condition, sourced from at least two independent preparations, to ensure the robustness and reliability of our in vitro data.

      Reviewer #2 (Public Review):

      Summary:

      The authors identified a new chloride-conducting Channelrhodopsin (MsACR1) that can be activated at low light intensities and within the red part of the visible spectrum. Additional engineering of MsACR1 yielded a variant (raACR1) with increased current amplitudes, accelerated kinetics, and a 20nm red-shifted peak excitation wavelength. Stimulation of MsACR1 and raACR1 expressing neurons with 635nm in mice's primary motor cortices inhibited the animals' locomotion.

      Strengths:

      The in vitro characterization of the newly identified ACRs is very detailed and confirms the biophysical properties as described by the authors. Notably, the ACRs are very light sensitive and allow for efficient in vitro inhibition of neurons in the nano Watt/mm^2 range. These new ACRs give neuroscientists and cell biologists a new tool to control chloride flux over biological membranes with high temporal and spatial precision. The red-shifted excitation peaks of these ACRs could allow for multiplexed application with blue-light excited optogenetic tools such as cation-conducting channelrhodopsins or green-fluorescent calcium indicators such as GCaMP.

      Weaknesses:

      The in-vivo characterization of MsACR1 and raACR1 lacks critical control experiments and is, therefore, too preliminary. The experimental conditions differ fundamentally between in vitro and in vivo characterizations. For example, chloride gradients differ within neurons which can weaken inhibition or even cause excitation at synapses, as pointed out by the authors. Notably, the patch pipettes for the in vitro characterization contained low chloride concentrations that might not reflect possible conditions found in the in vivo preparations, i.e., increasing chloride gradients from dendrites to synapses.

      We appreciate Reviewer 2’s feedback regarding missing control experiments. We will respond to these concerns in another section of our manuscript, as suggested. Regarding the chloride gradient, we understand the concerns of Reviewer 2, yet we chose these ionic conditions, particularly as they were used in the initial electrical characterization of GtACR1 in a neuronal context (Mahn et al., 2016). We will make sure to provide this context in our manuscript to justify our choice of ionic conditions.

      Interestingly, the authors used soma-targeted (st) MsACR1 and raACR1 for some of their in vitro characterization yielding more efficient inhibition and reduction of co-incidental "on-set" spiking. Still, the authors do not seem to have utilized st-variants in vivo.

      At the time of submission, due to the long-term absence of our lab technician, we were not able to produce purified viruses. Therefore, we decided to move on with the submission. We now produced the virus externally, and will provide the experiments.

      Most importantly, critical in vivo control experiments, such as negative controls like GFP or positive controls like NpHR, are missing. These controls would exclude potential behavioral effects due to experimental artifacts. Moreover, in vivo electrophysiology could have confirmed whether targeted neurons were inhibited under optogenetic stimulations.

      We have several non-injected control animals that we used to calibrate this particular paradigm and never saw similar responses. However, we acknowledge the suggestion of Reviewer 2 and will include the GFP-injected control as recommended.

      Some of these concerns stem from the fact that the pulsed raACR stimulation at 635 nm at 10Hz (Fig. 3E) was far less efficient compared to MsACR1, yet the in vivo comparison yielded very similar results (Fig. 4D).

      As outlined previously, the accelerated photocycle of raACR results in a reduction in photocurrent amplitude, consequently diminishing the potency of inhibition per photon. In the context of in vitro stimulation, where single action potentials are recorded, this reduction in inhibition efficiency is resolved. However, in the realm of in vivo behavioural analysis, the observed effect is not contingent on single action potentials but rather stems from the disruption of the entire M1 motor network. In this context, despite the reduced efficiency of the fast-cycling raACR, it still manages to interrupt the M1 network, leading to similar behavioural outcomes.

      Also, the cortex is highly heterogeneous and comprises excitatory and inhibitory neurons. Using the synapsin promoter, the viral expression paradigm could target both types and cause differential effects, which has not been investigated further, for example, by immunohistochemistry. An alternative expression system, for example, under VGLUT1 control, could have mitigated some of these concerns.

      Indeed, we acknowledge the limitations of our current experimental approach. We are in the process of planning and conducting additional experiments involving cre-dependent expression of st-MSACR and st-raACR in PV-Cre mice.

      Furthermore, the authors applied different light intensities, wavelengths, and stimulation frequencies during the in vitro characterization, causing varying spike inhibition efficiencies. The in vivo characterization is notably lacking this type of control. Thus, it is unclear why the 635nm, 2s at 20Hz every 5s stimulation protocol, which has no equivalent in the in vitro characterization, was chosen.

      We appreciate the valuable comment from the reviewer. The objective of our in vitro characterization is to elucidate the general effects of specific stimulation parameters on the efficiency of neuronal inhibition. For instance, we aim to demonstrate that lower light intensities result in less efficient inhibition, or that pulse stimulation may lead to a less complete inhibition, albeit significantly reducing the energy input into the system.

      In the in vivo characterization, we face constraints such as animal welfare considerations and limitations in available laser lines, which prevent us from exploring the entire parameter space as comprehensively as in the in vitro preparation. Additionally, it is important to note that membrane capacitance tends to be higher in vivo compared to dissociated hippocampal neurons. Consequently, we have opted for a doubled stimulation frequency from 10 Hz to 20 Hz and the stimulation pattern of 2 seconds ”on” and 5 seconds “off”. This approach allows the animals to spend less time in an arrested state while still demonstrating the effect of MsACR and variants.

      In summary, the in vivo experiments did not confirm whether the observed inhibition of mouse locomotion occurred due to the inhibition of neurons or experimental artifacts.

      In addition, the author's main claim of more efficient neuronal inhibition would require them to threshold MsACR1 and raACR1 against alternative methods such as the red-shifted NpHR variant Jaws or other ACRs to give readers meaningful guidance when choosing an inhibitory tool.

      The light sensitivity of MsACR1 and raACR1 are impressive and well characterized in vitro. However, the authors only reported the overall light output at the fiber tip for the in vivo experiments: 0.5 mW. Without context, it is difficult to evaluate this value. Calculating the light power density at certain distances from the light fiber or thresholding against alternative tools such as NpHR, Jaws, or other ACRs would allow for a more meaningful evaluation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the Editors for the opportunity to submit a revised manuscript, and the Reviewers for their positive evaluations and constructive comments. We feel that the comments and suggestions significantly improved the quality of our manuscript. We addressed all questions and suggestions in a point-by-point fashion below.

      Reviewer #1 (Public Review):

      This paper proposes and evaluates a new approach for the registration of human hippocampal anatomy between individuals. Such registration is an essential step in group analysis of hippocampal structure and function, and in most studies to date, volumetric registration of MRI scans has been employed. However, it is known that volumetric deformable registration, due to its formulation as an optimization problem that minimizes the combination of an image similarity term and relatively simple geometric regularization terms, fails to preserve the topology of complex structures. In the cerebral cortex, surface-based registration of inflated cortical surfaces is broadly preferred over volumetric registration, which often causes voxels of different tissue types to be matched (e.g., voxels belonging to a sulcus in one individual mapping onto voxels belonging to a gurys in another). The authors recognize that hippocampal anatomy is similarly complex, and, with proper tools, can benefit from surface-based registration. They propose to first unfold the hippocampus to a two-dimensional rectangle domain using their prior HippUnfold technique, and then to perform deformable registration in this rectangle domain, matching geometric features (curvature, thickness, gyrification) between individuals. This registration approach is evaluated by comparing how well hippocampal subfields traced by experts using cytoarchitectural information align between individuals after registration. The authors indeed show that surface-based registration aligns subfields better than volumetric registration applied to binary segmentations of the hippocampal gray matter.

      Overall, I find the methods and results in this paper to be convincing. The authors framed the comparison between surface-based and volumetric registration in a fair way, and the results convincingly show the advantage of surface-based registration. One slight limitation of the current study is that it is uncertain whether the benefits demonstrated here translate to in vivo MRI data for which the authors' HippUnfold algorithm is tailored. The current study utilized the unfolding technique used in HippUnfold on manual segmentations of high-resolution ex vivo MRI and blockface 3D volumes, which are likely closer to anatomical ground truth than automated segmentations of in vivo MRI. However, it is reasonable to assume that given that the volumetric registration to which the proposed approach was compared also used this high-detail data, the advantages of surface-based over volumetric registration would extend to in vivo MRI as well. However, I would encourage the authors to perform future evaluations on datasets with available in vivo and ex vivo MRI from the same individuals.

      We thank the Reviewer for the positive evaluation and the thoughtful feedback. We address each comment in the red text below.

      We have considered the Reviewer suggestion for a demonstration of the gains from our proposed method in MRI, and decided to include a new analysis of 7T in-vivo MRI data from 10 healthy participants (Supplementary Materials 1: in-vivo MRI demonstration).

      It is difficult to assess whether changes to the registration methods are indeed an improvement without same-subject “ground-truth” subfield definitions typically obtained from histology. In this new Supplementary Materials section, we demonstrate an overall sharpening of MRI-mapped features as an indirect indication of better inter-subject alignment (similar to the paper referenced in the comment, below). This is an important proof of concept that demonstrates that the gains made in the current project can be translated to in in-vivo MRI. We did not perform a demonstration of these gains in ex-vivo data, since this also comes with a host of challenges including access to such data and deformations and artifacts associated with ev-vivo scanning. However, we believe that the gains provided by our methods are limited mainly by image resolution and so while we note some concern about the gains from this method at 3T MRI, we expect that in ev-vivo gains provided by our method in higher resolution ex-vivo images should be consistent or better.

      We have added the following in-text Discussion of this new analysis (p. 13):

      “Ravikumar et al. (2021) recently performed flat mapping of the medial temporal lobe neocortex using a Laplace coordinate system as employed here, and showed sharpening of group-averaged images following deformable registration in unfolded space. This indirectly suggests better intersubject alignment. We perform a similar group-averaged sharpening analysis in Supplementary Materials 1: in-vivo demonstration. Though the gains in image sharpness observed here were modest, we note that current MRI resolution and automated segmentation methods allow for only imperfect hippocampal feature measures. We thus expect unfolded registration to grow in importance as MRI and segmentation methods continue to advance. “

      I would also like to point out the relevance of the 2021 paper "Unfolding the Medial Temporal Lobe Cortex to Characterize Neurodegeneration Due to Alzheimer's Disease Pathology Using Ex vivo Imaging" by Ravikumar et al. (https://link.springer.com/chapter/10.1007/978-3-030-87586-2_1) to the current work. This paper applied an earlier version of the unfolding method in HippUnfold to ex vivo extrahippocampal cortex and performed registration using curvature features in the rectangular unfolded space, also finding slight improvement with surface-based vs. volumetric registration, so its findings support the current paper.

      Thank you, we agree this is a highly relevant paper and have added a summary of it in the newly added Discussion paragraph which also outlines the new Supplementary Materials section (see previous comment).

      Overall, the paper has the potential to significantly influence future research on hippocampal involvement in cognition and disease. Outside of simple volumetry studies, most hippocampal morphometry studies rely on volumetric deformable registration of some kind, typically applied to whole-brain T1-weighted MRI scans. With HippUnfold available for anyone to use and not requiring manual registration, the paper provides a strong impetus for using this approach in future studies, particularly where one is interested in localizing effects of interest to specific areas of the hippocampus. Additional evaluation of in vivo HippUnfold using in vivo / ex vivo datasets, would make the use of this approach even more appealing.

      We would like to thank the Reviewer for their enthusiasm for the translatability of this work. We hope they are satisfied with our newly added in-vivo evaluation, and we appreciate the thoughtful suggestions.

      Reviewer #1 (Recommendations For The Authors):

      No additional recommendations.

      Reviewer #2 (Public Review):

      DeKraker et al. propose a new method for hippocampal registration using a surface-based approach that preserves the topology of the curvature of the hippocampus and boundaries of hippocampal subfields. The surface-based registration method proved to be more precise and resulted in better alignment compared to traditional volumetric-based registration. Moreover, the authors demonstrated that this method can be performed across image modalities by testing the method with seven different histological samples. While the conclusions of this paper are mostly well supported by data, some aspects of the method need to be clarified. This work has the potential to be a powerful new registration technique that can enable precise hippocampal registration and alignment across subjects, datasets, and image modalities.

      We thank the Reviewer for their thoughtful evaluation of our paper and helpful comments. We address them in the red text below each comment.

      Regarding the methodological clarification of the surfaced-based registration method, the last step of the process needs further clarification. Specifically, after creating the averaged 2D template, it is unclear how each individual sample is registered to sample1's space. If I understand correctly, after creating the averaged 2D template, each individual sample is then registered to sample1's space via the transform from each sample to the averaged template and then the inverse transform from the template to sample1's space. Samples included both left and right hemispheres, so were all samples being propagated to left hemisphere sample 1 space? The authors also note that a measure of the subfield labels overlap with that sample's ground-truth subfield definitions was calculated. Is this a measure of overlap, for example, between sample 3 (registered to sample 1 space) and the ground-truth (unfolded, not registered) sample 3 labels? It would be beneficial to provide a full walkthrough of one example sample to clarify the steps. Clarification of this aspect of the method is critical for understanding the evaluation of the method.

      We would like to thank the Reviewer for the suggestion, and have clarified the passage with the following walkthrough example as suggested by the Reviewer (p. 8):

      “For example, sample3 was unfolded and then registered to the unfolded average, making up two transformations. These were then concatenated with the inverse transformation of unfolded sample1 to the same unfolded average, and the inverse transformation of native sample1 to unfolded space. This concatenated transformation was used to project labels from sample3 native space directly to sample1 native space, which should ideally lead to near-perfect subfield alignment in sample1 native space. Dice overlap between sample1 and sample3 registered to sample1 was then calculated in sample1 native space.”

      Reviewer #2 (Recommendations For The Authors):

      Materials and Methods:

      In the Data section, it would be helpful for the authors to clarify whether each hippocampal histology sample is from a different individual or not. Additionally, for the 3D-PLI sample, the authors mention that the anterior/posterior parts of the hippocampus were cut off and the labels were extrapolated over the missing regions. It would be useful to know whether the extrapolation was done manually.

      Thank you, we have added separate labels (donors 1-4) for each individual from each dataset. We have also added that the 3D-PLI dataset was extrapolated manually. See the revised Materials and Methods: Data section.

      A small clarification, but for the morphological features calculated by HippUnfold, is thickness a measure of how much space each subfield takes up in the 2D unfolded space?

      Thickness is measured via HippUnfold, and we have clarified in-text that it is done in each subject’s native space (p. 6):

      Results:

      In the Results section, a brief summary or description of the Dice overlap metric would be helpful. The authors should also clarify if the Dice metric measures the overlap between an individual sample (e.g., sample3) that has been unfolded and registered/propagated to sample1 compared to the sample1 ground-truth subfields.

      We thank the Reviewer, and hope this is now clarified alongside the Reviewer’s Public comment with the addition of the example as quoted in our response to that comment.

      We also added to our description of Dice overlap as a measurement (p. 8):

      “The Dice overlap metric (Dice, 1945), which can also be considered an overlap fraction ranging from 0-1, was calculated for all subjects’ subfields registered to sample1.”

      Figure 3:

      In Figure 3A, it is unclear what "moving (sample 3)" refers to. Clarification is needed, and it would be helpful to know if this is sample 3 in native space before it has been unfolded/registered. In Figure 3B, there is a missing "native" before "folded" and "(right)" at the end of the sentence. With these edits, the sentence in the caption would read: "Each measure was calculated in unfolded space (left) and again in the first sample's (BigBrain left hemisphere) native folded space (right)."

      We thank the Reviewer, and have now changed “moving” to “sample3 before registration”, and added the suggested caption changes. See the revised Figure 3.

      Discussion:

      In the introduction, the authors provide a detailed description of the traditional 3D volumetric registration technique that utilizes gyral and sucal patterning as the primary feature for registration, along with other features such as thickness and intracortical myelin. Using their surface-based registration, the authors highlight an interesting finding that hippocampal curvature is the most informative individual feature, and thickness and curvature combined are the most informative features for registration and boundary alignment. In the discussion, it would be beneficial for the authors to discuss the relationship between curvature, thickness, and gyrification (e.g., is there overlapping information across these features) and comment on the reliability of these features observed in the current study compared to past work using traditional methods.

      This is an interesting point of discussion, thank you for raising it. We’ve added the following paragraph to the Discussion section (p. 13):

      “The feature most strongly driving surface-based registration in the present study was curvature. Many neocortical surface-based registration methods focus on gyral and sulcal patterning at various levels (e.g. strong alignment of primary sulci, with weaker weighting on secondary and tertiary sulci) (Miller et al., 2021). In the present study, hippocampal gyri are variable between samples and so could perhaps be thought of as similar to tertiary neocortical gyri, and this may help explain why gyrification was not the primary driving feature in aligning hippocampal subfields. The methods used to quantify gyrification are often related to curvature, but differ across studies. In the hippocampus, unlike in the neocortex, the mouth of sulci are wide and so sulcal depth, which is often used in defining neocortical gyrification index, is not straightforward to measure. Using HippUnfold, gyrification is defined by the extent of tissue distortion between folded and unfolded space, and individual gyri/sulci are hard to resolve in unfolded gyrification maps, but are readily visible in curvature maps. Thus, hippocampal curvature may be more informative simply due to higher measurement precision. Future work could also employ measures like quantitative T1 relaxometry or other proxies of intracortical myelin content (e.g. Tardif et al., 2015; Glasser et al., 2016; Paquola et al. 2018) for hippocampal alignment, but this is not possible in cross-modal work as in the various histology stains examined here.”

      Miscellaneous:

      There is a typo on page 11, line 318, with extra parentheses: "(e.g., (Borne et al., 2023;..."

      Thank you, we have corrected this error.

      Reviewer #3 (Public Review):

      Dekraker and colleagues previously developed a new computational tool that creates a "surface representation" of the hippocampal subfields. This surface representation was previously constructed using histology from a single case. However, it was previously unclear how to best register and compare these surface-based representations to other cases with different morphology.

      In the current manuscript, Dekraker and colleagues have demonstrated the ability to align hippocampal subfield parcellations across disparate 3D histology samples that differ in contrast, resolution, and processing/staining methods. In doing so, they validated the previously generated Big-Brain atlas by comparing seven different ground-truth subfield definitions. This is an impressive and valuable effort that provides important groundwork for future in vivo multi-atlas methods.

      We thank the Reviewer for their positive evaluations, and helpful suggestions. We provide responses to the recommendations in the red text below.

      Reviewer #3 (Recommendations For The Authors):

      There are a few points I think the authors should address, listed below.

      1) As the authors are well aware, subfield definitions vary considerably across laboratories. The current paper states that JD labeled the samples using three different atlas references: Ding & Van Hoesen, 2015; Duvernoy et al. ,2013, and Palomero-Gallagher et al., 2020. This is unclear, however, since these three references differ in their subfield definitions. For example, Ding & Van Hoesen and Palomero-Gallagher define a region called the prosubiculum (area between subiculum and CA1) but Duvernoy does not. Please clarify which boundary rules from which particular references were used here. How were discrepancies across these references resolved when applying labels to the current histological samples?

      We thank the Reviewer, and have added the following elaboration (p. 5):

      “Since these sources differ slightly in their boundary criteria, and no prior reference perfectly matches the present samples, subjective judgment was used to draw boundaries after considering all three prior works. The “prosubiculum” label used by Ding & Van Hoesen and Palomero-Gallagher et al. was included as part of the subicular complex. See Supplementary Materials 2: ground-truth segmentation for more details.”

      2) Another comment has to do more with the "style" of how this paper is written, especially given that this paper was submitted to eLIFE (i.e. not a specialty journal). For example, the motivation for the unfolded with and without registration methods was not well described. Similarly, there was almost no justification for the different methods applied in Figure 4 and I fear that the impact of these results will be lost on a non-expert reader.

      We added the following elaboration to the last paragraph of the Introduction section to motivate our benchmark against unfolding without registration (p. 3):

      “We benchmark this new method against unfolding alone, which provides some intrinsic alignment between subjects (DeKraker et al., 2018) but which we believe can be further improved with the present methods, and against more conventional 3D volumetric registration approaches.”

      We also added a Discussion paragraph on the results shown in Figure 4 which we hope helps to make these results more informative and impactful (p. 13):

      “The feature most strongly driving surface-based registration in the present study was curvature. Many neocortical surface-based registration methods focus on gyral and sulcal patterning at various levels (e.g. strong alignment of primary sulci, with weaker weighting on secondary and tertiary sulci) (Miller et al., 2021). In the present study, hippocampal gyri are variable between samples and so could perhaps be thought of as similar to tertiary neocortical gyri, and this may help explain why gyrification was not the primary driving feature in aligning hippocampal subfields. The methods used to quantify gyrification are often related to curvature, but differ across studies. In the hippocampus, unlike in the neocortex, the mouth of sulci are wide and so sulcal depth, which is often used in defining neocortical gyrification index, is not straightforward to measure. Using HippUnfold, gyrification is defined by the extent of tissue distortion between folded and unfolded space, and individual gyri/sulci are hard to resolve in unfolded gyrification maps, but are readily visible in curvature maps. Thus, hippocampal curvature may be more informative simply due to higher measurement precision. Future work could also employ measures like quantitative T1 relaxometry or other proxies of intracortical myelin content (e.g. Tardif et al., 2015; Glasser et al., 2016; Paquola et al. 2018) for hippocampal alignment, but this is not possible in cross-modal work as in the various histology stains examined here.”

      3) Finally, the application of the current work beyond the current dataset needs to be made more clear. From what I understand, the discussion says that using a multi-atlas approach with HippUnfold is unfeasible at this point. What kind of computational or technical developments need to take place in order for these labeled datasets to be used for this purpose? How can the current labeled datasets be used in other contexts?

      The question of translation to other contexts, namely, in-vivo MRI, was also raised by Reviewer 1, and as such we decided to include an additional analysis to explore this question (Supplementary Materials 1: in-vivo MRI demonstration). Validation using ground-truth subfields is not plausible in MRI, and so we show only an indirect validation of intersubject alignment based on the sharpening of group-averaged features following better alignment using the present methods. We believe this new analysis significantly clarifies the applications we have in mind for this work. See the new Supplementary Section for details, and also a summary of this analysis in the Discussion section (p. 13):

      “Ravikumar et al. (2021) recently performed flat mapping of the medial temporal lobe neocortex using a Laplace coordinate system as employed here, and showed sharpening of group-averaged images following deformable registration in unfolded space. This indirectly suggests better intersubject alignment. We perform a similar group-averaged sharpening analysis in Supplementary Materials 1: in-vivo demonstration. Though the gains in image sharpness observed here were modest, we note that current MRI resolution and automated segmentation methods allow for only imperfect hippocampal feature measures. We thus expect unfolded registration to grow in importance as MRI and segmentation methods continue to advance. “

      Multi-atlas approaches are also presently possible, but we believe HippUnfold can apply unfolding and subfield definition with even higher validity. Unfolding of the hippocampus was previously possible in-vivo but still showed limited intersubject alignment. The present work validates a novel alignment method ex-vivo, and now additionally shows that this can be translated to better alignment even at the resolution of in-vivo imaging. We hope the above new Discussion paragraph also helps to clarify this.

      4) A minor comment is that there are three panels (a,b,c) in Figure 4 but the figure legend does not describe them separately.

      We thank the Reviewer, and added a Figure legend for parts B and C.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for these helpful and thoughtful comments.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      • What was the nature of the 0.1 increase in pH caused by illumination in CheRiff-negative cells? Is this thought to be a temperature effect?

      The increase in pHoran4 fluorescence in CheRiff-negative cells is most likely not from a pH change; rather, it most likely reflects blue light-mediated photoactivation of the mOrange-derived chromophore in pHoran4. Similar photoartifacts have been reported in other fluorescent protein reporters (see e.g. Farhi, Samouil L., et al. "Wide-area all-optical neurophysiology in acute brain slices." Journal of Neuroscience 39.25 (2019): 4889-4908.).

      The baseline measurement in CheRiff-negative cells is to control for this type of artifact. We subtract the mean signal from the CheRiff-negative cells to correct the signals from the CheRiff-positive cells, as described in the Main Text.

      • Does Kir2.1 have a proton conductance? Was the resting pH of HEK cells changed by Kir2.1 expression? Fig 2D suggest basal pH is equivalent +/- Kir2.1 but it would be good to show that data.

      This is an interesting question which our data do not answer conclusively. Since we used an intensiometric (as opposed to ratiometric) pH indicator, our measurements only provide relative pH changes. We assumed a constant initial pH. We have revised the text to make clear that this is an assumption.

      Prior studies of pH-dependent Kir2.1 activity did not find evidence of a proton current (i.e. no change in current upon extracellular acidification), though the channel is closed by intracellular acidification. See: Ye, Wenlei, et al. "The K+ channel KIR2. 1 functions in tandem with proton influx to mediate sour taste transduction." Proceedings of the National Academy of Sciences 113.2 (2016): E229-E238. We added this information to the text.

      The pKa of pHoran4 is 7.5, so a decrease in initial pH would decrease the slope of F vs pH. We observed higher (absolute value) F/F in the Kir2.1 expressing cells than in the non-expressing cells, confirming that the Kir2.1-expressing cells had larger CheRiff-mediated acidification than the Kir2.1-negative cells (Figure 2D). Thus this conclusion remains true regardless of whether Kir2.1 has a proton conductance.

      What channels/transporter mediate proton flux in CheRiff + Kir2.1 experiments? Is the increased proton flux simply due to more H+ ions passing through CheRiff when cells are hyperpolarized or may other voltage-dependent processes effect pH?

      Fig. 2G-M address this question, specifically. We targeted the blue light in a “zebra” pattern to only activate CheRiff in a subset of cells. We then used voltage imaging to show that the induced voltage spread over a much wider area than the blue-illuminated region, due to gap junction coupling between the cells. If protons flowed through some voltage-dependent channel other than CheRiff, then we would expect the acidification to follow the voltage profile. If protons primarily flowed through the CheRiff, then we would expect the acidification to follow the illumination profile. Fig. 2K and the following quantification show clearly that the acidification followed the illumination profile, and hence the proton current was primarily through CheRiff.

      • Is Kir2.1 included in the spatial illumination experiments (Fig. 2G-M)? If so, it would be helpful to note it. The color scheme suggest it is but it would be good to note it explicitly.

      Yes. Clarified in text.

      • Why is the acidification caused by 10 second of illumination smaller in Fig 2L, as compared to the equivalent experiment in 2D? Is this due to the spatial nature of the illumination? It seems that the pH change at the site of illumination should be equivalent between these 2 experiments.

      The illumination protocol between the two experiments has different duty cycles (compare Fig. 2C and 2J), so the time-averaged intensity is different. There can also be batch-to-batch variation in CheRiff expression which would alter the proton flux and thus pH change. To control for this, comparisons were always made between batches of cells prepared together.

      • The authors used 150 second illumination to examine pH changes but only 13.5 seconds to differentiate between pH changes caused by the light-activated conductance and those secondary to depolarization. Would pH changes lose their spatial limitations if a similar 150 second illumination was used? This is important because the pH change seen in the "Blue On" region was quite small.

      Yes, protons can diffuse between cells via gap junctions, smoothing out the spatial structure of the pH over long times. See e.g. Wu, Ling, et al. "PARIS, an optogenetic method for functionally mapping gap junctions." Elife 8 (2019): e43366.

      We used a short (13.5 s) protocol specifically to distinguish CheRiff-mediated acidification from acidification via other conductances in electrically coupled neighboring cells. If we had waited for longer, lateral proton diffusion could have muddied the interpretation of these experiments.

      • How long do action potentials shown in between illuminations in Fig 4H (ChR2 3M) last following cessation of illumination?

      The closing time, τoff, of the Channelrhodopsins are shown in Table 1. The ChR2-3M has an off-time of almost 2 seconds. The duration of post-stimulus persistent firing is expected to depend on the expression level of the ChR2-3M, the strength of the optogenetic stimulus and the excitation threshold of the neurons, i.e. on how far above threshold the neuron is at the moment the blue light turns off. Thus we expect the post-stimulus firing time to be highly variable between cells and also to depend on optogenetic stimulus strength. In our experiments action potentials were observed throughout the 0.5 s dark interval between stimuli.

      • While ChR2-3M construct may have promise for therapeutic applications, those strengths limit its use or basic science applications like circuit mapping. This should be noted in the discussion.

      Ok. We now mention this in the discussion.

      • Please define EPD50 within the text of the results section.

      Ok. Fixed.

      Reviewer #2 (Recommendations For The Authors):

      This is an interesting manuscript investigating a potential limitation of optogenetic manipulation of cell excitability and its solution. The work is conducted rigorously and explained clearly. I only have minor concerns:

      I think the impact of the study could be broadened by examining additional proton permeable opsins for their effects on intracellular pH. A single assay could be used to compare different opsins to CheRiff and show that the problem of intracellular acidification is not limited to CheRiff.

      Yes, this is interesting. There are so many opsins and illumination protocols in use that we could not do an exhaustive characterization; we encourage people to test their own opsin under their conditions if doing chronic simulation. The plasmid constructs used for this work are available on Addgene.

      I am not clear on what Figure S3A is showing because I cannot see a patterning like the one shown in Fig. 2H. Perhaps a higher magnification could solve the problem.

      Figure S3A does not have the zebra-striped pattern of Figure 2H. In Fig S3A, we used just one column of illumination. The point was to test the ability of each opsin to depolarize the HEK cells. We added images of the illumination pattern and adjusted the caption to make this clear.

      When discussing the sustained photocurrent of PsCatCh2.0, a reference to Govorunova et al. J. Biol. Chem. 2013 should be added as the low extent of light induced inactivation appears to be, at least in part, a characteristic of the particular type of opsin from P. subcordiformis.

      Added reference.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Please describe the criteria for binocularity of dLGN neurons, and what % of recorded neurons meet this criteria. Do all the example neurons in figure 1D meet the criteria for binocular neurons?

      We now include criteria for binocularity of dLGN neurons in the methods section on page 24, and mention the percentage of binocular neurons that we detected. We also indicate which of the example neurons in figure 1D are monocular or binocular according to these criteria. We would like to stress that these percentages are not representative for the level of binocularity in dLGN as a whole, as our recordings were limited to the frontal ipsilateral projection zone of dLGN, which is its most binocular region, and only units with a receptive field within 30o from the center were included in the analysis. We mention this in the discussion on page 23.

      Fig 1: Please perform statistical comparison of data presented in Figure 1c by genotype, as in other figures.

      We conducted post-hoc Tukey's tests exclusively when a significant interaction between phenotype and genotype was detected in the two-way ANOVA (as seen in Figs. 2B and 3E). This decision was made because interpreting a significant post-hoc test becomes uncertain when there is no interaction, which is evident in Fig. 1C. In that case, the posthoc Tukey's test yielded a p-value of 0.044 for the difference in RF size between KO NOMD and KO MD, while all other comparisons were not significant (WT NO-MD vs WT MD: P=0.15, WT NO-MD vs KO NO-MD: p=0.99, WT MD vs KO MD: p=0.21). However, since there was no significant interaction between genotype and phenotype, we cannot conclude that there is an effect in KO mice that is absent in WT mice. In Fig. 3B, all posthoc Tukey's tests resulted in P-values greater than 0.05.

      Fig 1e: There is no justification for splitting the data into two time epochs before and after 150 msec. A repeated measures anova of smaller time bins across the full time course would be more effective/appropriate here.

      The reviewer is correct. We have now performed a repeated measures ANOVA.

      Fig 2: GABA a1R KO results in a loss/absence of OD plasticity, not a reduction

      We agree. We have changed the wording.

      Fig 3: Please be specific about the location of V1 recordings. Was layer-specificity determined?

      The location of V1 recordings is mentioned in the methods section under “Electrophysiology recordings, visual stimulation and V1 silencing”, page 23. We have assessed OD per depth, but found that we do not have sufficient units to draw any conclusions about differences in plasticity per layer.

      Why is feedback from V1 more influential in dLGN OD plasticity in KO?

      We believe this is because the reduced thalamic inhibition causes the excitation/inhibition ratio to shift in favor of excitation. We discuss this more extensively on page 19 of the discussion.

      Fig 4: Inclusion of a GABA R antagonist protects thalamic axons from muscimol silencing (Liu BH, Wu GK, Arbuckle R, Tao HW, Zhang LI. Defining cortical frequency tuning with recurrent excitatory circuitry. Nat. Neurosci. 2007;10:1594-600.)

      We now mention the possible direct influence of muscimol on thalamic axons in the discussion on page 19 and cite the suggested article.

      The observation that feedback from primary visual cortex does not contribute to adult visual thalamus plasticity is interesting and important. The authors should expand on their discussion of this observation to include changes in cortical circuitry that may help to explain this observation.

      We have expanded this part of the discussion on page 20.

      The authors should describe the pathway by which inhibition enables plasticity in dLGN.

      We discuss this more extensively on page 17 in the updated manuscript.

      Reviewer #2:

      1) The current work was basically a follow-up of a previous study in juvenile mice, and the results were also very similar to the juvenile results (Sommeijer et al., 2017). One possible interpretation of the results is that the lack of OD plasticity in adult V1 and dLGN was caused by an early blockade of the development of the inhibitory circuit in dLGN, which retains the dLGN in an immature stage till adulthood. The authors indeed claimed in the discussion that the 2-day OD shift is intact in juvenile dLGN and V1 in KO mice, and provided evidence in supplementary figure that GABAergic and cholinergic synapse amount are similar between WT and KO mice. However, the 7-day OD shift is indeed defected in juvenile V1 and dLGN in KO mice (Sommeijer et al., 2017), and it is possible that this early functional deficit didn't induce a structural remodeling in adulthood. To better support the author's claim that the lack of adult V1 OD plasticity is specifically due to reduced dLGN synaptic inhibition, the author should generate conditional KO mice that dLGN synaptic inhibition was only interfered in adulthood.

      In order to address this criticism it is important to discuss the plasticity deficits in dLGN and V1 separately.

      Concerning V1 plasticity: We have previously shown that brief MD induces an OD shift in V1 of mice lacking thalamic synaptic inhibition in dLGN. OD plasticity induced by brief MD is a hallmark of critical period plasticity in V1, and it thus seems highly unlikely that critical period onset in V1 is defective or that development of V1 is halted in an immature state that does not support OD plasticity in thalamus-specific GABRA1 deficient mice.

      The observed plasticity deficit during the critical period was limited to the second stage of the OD shift in V1, which requires long-term monocular deprivation. The straightforward explanation for this result and our current findings is that both during the critical period and in adulthood, the second stage of OD plasticity in V1 induced by long-term monocular deprivation requires thalamic plasticity or inhibition. The proposed alternative, that lack of thalamic synaptic inhibition during development results in a possible lack of structural change in V1 that would cause a lifelong deficiency selectively affecting OD plasticity induced by long-term monocular deprivation, requires many more assumptions.

      Concerning dLGN plasticity: The simplest explanation for the observed lack of OD plasticity in dLGN is that it is a direct consequence of the absence of synaptic inhibition in the KO mice. However, an alternative explanation could indeed be that dLGN is kept in an immature (pre-critical period-like) state due to the developmental absence of synaptic inhibition. This situation would be analogous to that in V1 of GAD65 deficient mice (which have reduced GABA release), in which OD plasticity cannot be induced by brief monocular deprivation during the critical period or in adulthood (Fagiolini and Hensch, 2000). Because this deficit can be reversed by treating the mice with benzodiazepines (allosteric modulators of GABA receptors) at any age, it is thought that development of V1 in GAD65 mice is halted in a pre-critical period-like state until inhibition is strengthened. We cannot exclude that something similar occurs in dLGN of mice lacking thalamic synaptic inhibition, although we did not observe any changes in hallmarks of dLGN maturity, such as reduced receptive field size, and increased cholinergic and inhibitory bouton densities.

      However, if the analogy with the developmental deficit in V1 of GAD65 deficient mice is valid, the reduced plasticity is still likely to be a direct consequence of reduced inhibition. In GAD65 deficient mice, long term monocular deprivation during the critical period causes a full OD shift, showing that no additional deficits (besides reduced inhibition) limit OD plasticity in V1 of these mice (Gagiolini and Hensch 2000). And, as already mentioned, increasing inhibition rescues OD plasticity in GAD65 KO mice. Thus, the immature state of V1 in these mice is probably nothing more than a situation in which inhibition tone is too low to support efficient OD plasticity. In dLGN, knocking out GABRA1 at a later age could therefore also create a situation in which inhibition is too low to support thalamic OD plasticity, which is not different from the situation in which the gene is inactivated at birth. Only if lack of synaptic inhibition in thalamus affects another, unknown developmental process that is of importance later in life to support OD plasticity in dLGN, the proposed experiment would result in a different outcome. We are not convinced that this scenario is likely enough to justify repeating most of this study, but now using mice in which GABRA1 is inactivated in dLGN through bilateral AAV-cre injections.

      Independently of the exact cause of the plasticity deficit in dLGN, our results make clear that a cortical plasticity deficit in adulthood can have a thalamic origin, which we believe is an important insight that is highly relevant.

      We have included part of these arguments in the discussion on page 17.

      2) The authors found that in juveniles, dLGN OD shift is dependent on V1 feedback, but not in adults. However, a recent work showed that the effects of V1 silencing on dLGN OD plasticity could differ with various starting points and duration of the V1 silencing and MD (Li et al., 2023). Could the authors provide more details of the MD and V1 silencing for an in-depth discussion?

      We discuss some of the findings of the Li et al paper on pages 16 and 20 of the manuscript now.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Meta-cognition, and difficulty judgments specifically, is an important part of daily decision-making. When facing two competing tasks, individuals often need to make quick judgments on which task they should approach (whether their goal is to complete an easy or a difficult task).

      In the study, subjects face two perceptual tasks on the same screen. Each task is a cloud of dots with a dominating color (yellow or blue), with a varying degree of domination - so each cloud (as a representation of a task where the subject has to judge which color is dominant) can be seen an easy or a difficult task. Observing both, the subject has to decide which one is easier.

      It is well-known that choices and response times in each separate task can be described by a driftdiffusion model, where the decision maker accumulates evidence toward one of the decisions (”blue” or ”yellow”) over time, making a choice when the accumulated evidence reaches a predetermined bound. However, we do not know what happens when an individual has to make two such judgments at the same time, without actually making a choice, but simply deciding which task would have stronger evidence toward one of the options (so would be easier to solve).

      It is clear that the degree of color dominance (”color strength” in the study’s terms) of both clouds should affect the decision on which task is easier, as well as the total decision time. Experiment 1 clearly shows that color strength has a simple cumulative effect on choice: cloud 1 is more likely to be chosen if it is easier and cloud 2 is harder. Response times, however, show a more complex interactive pattern: when cloud 2 is hard, easier cloud 1 produces faster decisions. When cloud 2 is easy, easier cloud 1 produces slower decisions.

      The study explores several models that explain this effect. The best-fitting model (the Difference model is the paper’s terminology) assumes that the decision-maker accumulates evidence in both clouds simultaneously and makes a difficulty judgment as soon as the difference between the values of these decision variables reaches a certain threshold. Another potential model that provides a slightly worse fit to the data is a two-step model. First, the decision maker evaluates the dominant color of each cloud, then judges the difficulty based on this information.

      Thank you for a very good summary of our work.

      Importantly, the study explores an optimal model based on the Markov decision processes approach. This model shows a very similar qualitative pattern in RT predictions but is too complex to fit to the real data. It is hard to judge from the results of the study how the models identified above are specifically related to the optimal model - possibly, the fact that simple approaches such as the Difference model fit the data best could suggest the existence of some cognitive constraints that play a role in difficulty judgments.

      The reviewer asks “how the models identified above are specifically related to the optimal model”. We did fit the four models to simulations of the optimal model and found that the Difference model was the closest. However, we did not fit the parameters of the optimal model to the data (no easy feat given the complexity of the model) as the experiment was not designed to incentivize maximization of the reward rate and fitting would have been computationally laborious. We therefore focused on the qualitative features of the optimal model and how they compare to our models. We now also include the optimal model for the known color dominance RT experiment (line 420). We have also added a new paragraph in the Discussion on the optimal model at line 503 comparing it qualitatively to the Difference model.

      The Difference model produces a well-defined qualitative prediction: if the dominant color of both clouds is known to the decision maker, the overall RT effect (hard-hard trials are slower than easyeasy trials) should disappear. Essentially, that turns the model into the second stage of the twostage model, where the decision maker learns the dominant colors first. The data from Experiment 2 impressively confirms that prediction and provides a good demonstration of how the model can explain the data out-of-sample with a predicted change in context.

      Overall, the study provides a very coherent and clean set of predictions and analyses that advance our understanding of meta-cognition. The field would benefit from further exploration of differences between the models presented and new competing predictions (for instance, exploring how the sequential presentation of stimuli or attentional behavior can impact such judgments). Finally, the study provides a solid foundation for future neuroimaging investigations.

      Thank you for your positive comments and suggestions.

      Reviewer #2 (Public Review):

      Starting from the observation that difficulty estimation lies at the core of human cognition, the authors acknowledge that despite extensive work focusing on the computational mechanisms of decision-making, little is known about how subjective judgments of task difficulty are made. Instantiating the question with a perceptual decision-making task, the authors found that how humans pick the easiest of two stimuli, and how quickly these difficulty judgments are made, are best described by a simple evidence accumulation model. In this model, perceptual evidence of concurrent stimuli is accumulated and difficulty is determined by the difference between the absolute values of decision variables corresponding to each stimulus, combined with a threshold crossing mechanism. Altogether, these results strengthen the success of evidence accumulation models, and more broadly sequential sampling models, in describing human decision-making, now extending it to judgments of difficulty.

      The manuscript addresses a timely question and is very well written, with its goals, methods and findings clearly explained and directly relating to each other. The authors are specialists in evidence accumulation tasks and models. Their modelling of human behaviour within this framework is state-of-the-art. In particular, their model comparison is guided by qualitative signatures which are diagnostic to tease apart the different models (e.g., the RT criss-cross pattern). Human behaviour is then inspected for these signatures, instead of relying exclusively on quantitative comparison of goodness-of-fit metrics. This work will likely have a wide impact in the field of decisionmaking, and this across species. It will echo in particular with many other studies relying on the similar theoretical account of behaviour (evidence accumulation).

      Thank you for these generous comments.

      A few points nevertheless came to my attention while reading the manuscript, which the authors might find useful to answer or address in a new version of their manuscript.

      1) The authors acknowledge that difficulty estimation occurs notably before exploration (e.g., attempting a new recipe) or learning (e.g., learning a new musical piece) situations. Motivated by the fact that naturalistic tasks make difficult the identification of the inference process underlying difficulty judgments, the authors instead chose a simple perceptual decision-making task to address their question. While I generally agree with the authors’s general diagnostic, I am nevertheless concerned so as to whether the task really captures the cognitive process of interest as described in the introduction. As coined by the authors themselves, the main function of prospective difficulty judgment is to select a task which will then ultimately be performed, or reject one which won’t. However, in the task presented here, participants are asked to produce difficulty judgments without those judgements actually impacting the future in the task. A feature thus key to difficulty judgments thus seems lacking from the task. Furthermore, the trial-by-trial feedback provided to participants also likely differ from difficulty judgments made in real world. This comment is probably difficult to address but it might generally be useful to discuss the limitations of the task, in particular in probing the desired cognitive process as described in introduction. Currently, no limitations are discussed.

      We have added a Limitations paragraph to the Discussion and one item we deal with is the generalization of the model to more complex tasks (line 539).

      2) The authors take their findings as the general indication that humans rely on accumulation evidence mechanisms to probe the difficulty of perceptual decisions. I would probably have been slightly more cautious in excluding alternative explanations. First, only accumulation models are compared. It is thus simply not possible to reach a different conclusion. Second, even though it is particularly compelling to see untested predictions from the winning model in experiment #1 to be directly tested, and validated in a second experiment, that second experiment presents data from only 3 participants (1 of which has slightly different behaviour than the 2 others), thereby limiting the generality of the findings. Third, the winning model in experiment #1 (difference model) is the preferred model on 12 participants, out of the 20 tested ones. Fourth, the raw BIC values are compared against each other in absolute terms without relying on significance testing of the differences in model frequency within the sample of participants (e.g., using exceedance probabilities; see Stephan et al., 2009 and Rigoux et al., 2014). Based on these different observations, I would thus have interpreted the results of the study with a bit more caution and avoided concluding too widely about the generality of the findings.

      Thank you for these suggestions.

      i) We have now make it clear in the Results (line 126) that all four models we examine are accumu-lation models. In addition, we have added a paragraph on Limitations (line 530) in the Discussion where we explain why we only consider accumulation models and acknowledge that there are other non-accumulation models.

      ii) Each of three participants in Experiment 2 performed 18 sessions making it a large and valuabledataset necessary to test our hypothesis. We have now included a mention of the the small number of participants in Experiment 2 in a Limitations paragraph in the Discussion (line 539).

      iii) As suggested, we have now calculated exceedance probabilities for the 4 models which gives[0,0.97,0.03,0]. This shows that there is a 0.97 probability of the Difference model being the most frequent and only a 0.03 probability of the two-step model. We have included this in the results on line 237.

      3) Deriving and describing the optimal model of the task was particularly appreciated. It was however a bit disappointing not to see how well the optimal model explains participants behaviour and whether it does so better than the other considered models. Also, it would have been helpful to see how close each of the 4 models compared in Figures 2 & 3 get to the optimal solution. Note however that neither of these comments are needed to support the authors’ claims.

      The reviewer asks how close each of the four models is to the optimal solution. We did fit the four models to simulations of the optimal model and found that the Difference model was the closest. However, we did not fit the parameters of the optimal model to the data (no easy feat given the complexity of the model) as the experiment was not designed to incentivize maximization of the reward rate and fitting would have been computationally laborious. We therefore focused on the qualitative features of the optimal model and how they compare to our models. We now also include the optimal model for the known color dominance RT experiment (line 420). We have also added a new paragraph in the Discussion on the optimal model at line 503 comparing it qualitatively to the Difference model.

      4) The authors compared the difficulty vs. color judgment conditions to conclude that the accumulation process subtending difficulty judgements is partly distinct from the accumulation process leading to perceptual decisions themselves. To do so, they directly compared reaction times obtained in these two conditions (e.g. ”in other cases, the two perceptual decisions are almost certainly completed before the difficulty decision”). However, I find it difficult to directly compare the ’color’ and ’difficulty’ conditions as the latter entails a single stimulus while the former comprises two stimuli. Any reaction-time difference between conditions could thus I believe only follow from asymmetric perceptual/cognitive load between conditions (at least in the sense RT-color < RT-difficulty). One alternative could have been to present two stimuli in the ’color’ condition as well, and asking participants to judge both (or probe which to judge later in the trial). Implementing this now would however require to run a whole new experiment which is likely too demanding. Perhaps the authors could instead also acknowledge that this a critical difference between their conditions, which makes direct comparison difficult.

      We feel we can rule out that participants make color decisions (as in the color task) to make difficulty decisions. For example, making a color choice for 0% color strength takes longer than a difficulty choice for 0:52% color strengths. Thus, the difficulty judgment does not require completion of the color decisions. Therefore, average reaction time for a single color patch (C𝑆1) can be longer than the reaction time for the difficulty task which contains the same coherence (C𝑆1) for one of the patches. This is true despite the difficulty decision requiring monitoring of two patches (which might be expected to be slower than monitoring one patch). We have added this in to the Discussion at line 449.

      Reviewer #3 (Public Review):

      The manuscript presents novel findings regarding the metacognitive judgment of difficulty of perceptual decisions. In the main task, subjects accumulated evidence over time about two patches of random dot motion, and were asked to report for which patch it would be easier to make a decision about its dominant color, while not explicitly making such decision(s). Using 4 models of difficulty decisions, the authors demonstrate that the reaction time of these decisions are not solely governed by the difference in difficulties between patches (i.e., difference in stimulus strength), but (also) by the difference in absolute accumulated evidence for color judgment of the two stimuli. In an additional experiment, the authors eliminated part of the uncertainty by informing participants about the dominant color of the two stimuli. In this case, reaction times were faster compared to the original task, and only depended on the difference between stimulus strength.

      Overall, the paper is very well written, figures and illustrations clearly and adequately accompanied the text, and the method and modeling are rigor.

      The weakness of the paper is that it does not provide sufficient evidence to rule out the possibility that judging the difficulty of a decision may actually be comparing between levels of confidence about the dominant color of each stimulus. One may claim that an observer makes an implicit color decision about each stimulus, and then compares the confidence levels about the correctness of the decisions. This concern is reflected in the paper in several ways:

      We tested a Difference in confidence model (line 315) in the orginal paper and showed it was inferior to the Difference model. We did this for experiment 2, RT task so that we could fit the unknown color condition and try to predict the known color condition. To emphasize this model (which we think the reviewer may have missed) we have moved the supplementary figure to the main results (now Fig. 6) as we think it is very cool that we were able to discard the confidence model.

      When comparing the confidence model to the Difference we found the difference model was pre-Δ ferred with BIC of 38, 56, 47. We are unsure why the reviewer feels this “does not provide sufficient evidence to rule out the possibility that judging the difficulty of a decision may actually be comparing between levels of confidence about the dominant color of each stimulus.” We regard this as strong evidence.

      1) It is not clear what were the actual instructors to the participants, as two different phrasings appear in the methods: one instructs participants to indicate which stimulus is the easier one and the other instructs them to indicate the patch with the stronger color dominance. If both instructions are the same, it can be assumed that knowing the dominant color of each patch is in fact solving the task, and no judgment of difficulty needs to be made (perhaps a confidence estimation). Since this is not a classical perceptual task where subjects need to address a certain feature of the stimuli, but rather to judge their difficulties, it is important to make it clear.

      We now include the precise words used to instruct the participant (line 604): “Your task is to judge which patch has a stronger majority of yellow or blue dots. In other words: For which patch do you find it easier to decide what the dominant color is? It does not matter what the dominant color of the easier patch is (i.e., whether it is yellow or blue). All that matters is whether the left or right patch is easier to decide”.

      Knowing both colors or the dominant color is not sufficient to solve the task. Knowing both are yellow does not tell you which has more yellow which is what you need to estimate to solve the task. Again, we tested a confidence model in the original version of the paper and showed it was a poor model compared to the Difference model.

      2) Two step model: two issues are a bit puzzling in this model. First, if an observer reaches a decision about the dominant color of each patch, does it mean one has made a color decision about the patches? If so, why should more evidence be accumulated? This may also support the possibility that this is a ”post decision” confidence judgment rather than a ”pre decision” difficulty judgment. Second, the authors assume the time it takes to reach a decision about the dominant color for both patches are equal, i.e., the boundaries for the ”mini decision” are symmetrical. However, it would make sense to assume that patches with lower strength would require a longer time to reach the boundaries.

      In the Two-step model we assume a mini decision is made for the color of each stimulus. However, the assumption is that this is made with a low bound so it is not a full decision as in a typical color decision. Again estimating the colors from the mini decision does not tell you which is easier so you need to accumulate more evidence to make this judgment. In fact the Race model is a version of the two step in which no further accumulation is made after the initial decision and this model fits poorly (we now explain this on line 185). We assume for simplicity that the first stimulus to cross a bound triggers both mini color decisions. So although the bounds are equal the one with stronger color dominance is more likely to hit the bound first.

      We have already addressed this concern about the comparison with confidence above.

      3) Experiment 2: the modification of the Difference model to fit the known condition (Figure 5b),can also be conceptualized as the two-step model, excluding the ”mini” color decision time. These two models (Difference model with known color; two-step model) only differ from each other in a way that in the former the color is known in advance, and in the second, the subject has to infer it. One may wonder if the difference in patterns between the two (Figure 3C vs. Figure 6B) is only due to the inaccuracies of inferring the dominant color in the two-step model.

      In Experiment 2 the participant is explicitly informed as to the color dominance of both stimuli. Therefore, assuming the two-step model skips the first step and uses this explicit information in the second step, the difference and two-step model are identical for modeling Experiment 2. We explain this now on line 277.

      As the reviewer suggests, differences in predictions between the Difference and Two-step arise from trials in which there is a mismatch between the inferred dominant colors from the two-step model and the color associated with the final DVs in the Difference model. We now explain this on line 187. We do not see this as a problem of any sort but just defines the difference between the models. Note that the new exceedance analysis now strongly supports the Difference model as the most common model among the participants.

      An additional concern is about the controlled duration task: Why were these specific durations chosen (0.1-1.65 sec; only a single duration was larger than 1sec), given the much longer reaction times in the main task (Experiment 1), which were all larger on average than 1sec? This seems a bit like an odd choice. Additionally, difficulty decision accuracies in this version of the task differ between known and unknown conditions (Figure 7), while in the reaction time version of the same task there were no detectable differences in performance between known and unknown conditions (Figure 6C), just in the reaction times. This discrepancy is not sufficiently explained in the manuscript. Could this be explained by the short trial durations?

      The reviewer asks about the choice of stimulus durations in Experiment 2. First, RTs in Experiment 1 do not only reflect the time needed to make decisions but also contain non-decision times (0.23-0.47 s). So to compare decision time in RT and controlled duration experiment one must subtract the non-decision time from the RTs (the non-decision time is not relevant to the controlled duration experiment). Second, the model specifically predicts that differences in performance between the known and unknown color dominance conditions are largest for short duration stimulus presentation trials (see Fig. 7). We explain this on line 346. For long durations, performance pretty much plateaus, and many decisions have already terminated (Kiani 2008). We sample stimulus durations from a discrete truncated exponential distribution to get roughly equal changes in accuracy between consecutive durations (which we now explain at line 345).

      Group consensus review

      The reviewers have discussed with each other, and they have discussed a series of revisions which, if carried out, would make their evaluation of your paper even more positive. I outline them below in case you would be interested in revising your paper based on these reviews. You will see below that the reviewers share overall a quite positive evaluation of your study. All three limitations described in the Public Reviews could be addressed explicitly in the discussion which for the moment is limited to description and generalization of findings.

      1) The model selection procedure should be amended and strengthened to provide clearer results. As noted by one of the reviewers during the consultation session, ”the Difference model just barely wins over the two-step model, and the two-step model might produce the same prediction for the next experiment.” You will also see below that Reviewer #2 provides guidance to improve the model selection process: ”[...] the second experiment presents data from only 3 participants (1 of which has slightly different behaviour than the 2 others), thereby limiting the generality of the findings. Third, the winning model in experiment #1 (difference model) is the preferred model on 12 participants, out of the 20 tested ones. Fourth, the raw BIC values are compared against each other in absolute terms without relying on significance testing of the differences in model frequency within the sample of participants (e.g., using exceedance probabilities; see Stephan et al., 2009 and Rigoux et al., 2014).” Altogether, model selection appears currently to be the ’weakest’ part of the paper (Difference model vs. Two-step model, model comparison, how to better incorporate the optional model with the other parts). It would be great if you would improve this section of the Results.

      Thank you for these suggestions.

      i) We have now make it clear in the Results (line 126) that all four models we examine are accumu-lation models. In addition, we have added a paragraph on Limitations (line 530) in the Discussion where we explain why we only consider accumulation models and acknowledge that there are other non-accumulation models.

      ii) Each of three participants in Experiment 2 performed 18 session making it a large and valuabledataset necessary to test our hypothesis. We have now included a mention of the the small number of participants in Experiment 2 in a Limitations paragraph in the Discussion (line 539).

      iii) We have now calculated exceedance probabilities for the 4 models which gave [0,0.97,0.03,0]. This shows that there is a 0.97 probability of the Difference model being the most frequent and only a 0.03 probability of the two-step model. We have included this in the results on line 237.

      2) All reviewers have noted that the relation of the optimal model with the human data and theother models should be clarified and discussed in a revised version of the manuscript. You will find their specific comments in their individual reviews, appended below.

      We now include the optimal model for the known color dominance RT experiment (line 420). We have also added a new paragraph in the Discussion on the optimal model at line 503 comparing it to the Difference model.

      3) Finally, the exclusion strategy is also unclear at the moment and should be clarified and discussed explicitly somewhere in a revised version of the manuscript. Reviewers were wondering why so many participants were excluded from Experiment 1, and only 3 participants were included in Experiment 2. This should also be clarified better in the manuscript.

      We have clarified the exclusion criteria in the Methods at line 651 as a new subsection.

      The data quality problem with MTurk is well documented (Chmielewski, M & Kucker SC. 2020. An MTurk Crisis? Shifts in Data Quality and the Impact on Study Results. Social Psychological and Personality Science, 11, 464-473). Given that this was an online experiment on MTurk, it is hard to know exactly why some participants showed low accuracy, but it’s likely that some may have misunderstood the instructions in the difficulty task or they may have been unmotivated to do well in this highly repetitive task. Either reason would be problematic for our model comparisons that are based on choice-RT patterns. Note that the cut-offs we chose for inclusion were purely based on accuracy, whereas the modeling approach considered RTs, which importantly were not used as a inclusion criterion (see revised methods). Moreover, accuracy cut-offs were fairly lenient and mainly aimed to exclude participants who appeared to be guessing/misunderstood instructions (for reference: mean sensitivity of participants who were included was 2x higher than the cut-offs we used).

      Each of three participants in Experiment 2 performed 18 session making it a large and valuable dataset necessary to test our hypothesis. We have now included a mention of the the small number of participants in Experiment 2 in a Limitations paragraph in the Discussion (line 539).

      Reviewer #1 (Recommendations For The Authors):

      Thank you for an excellent paper, I enjoyed reading it a lot. I have a few questions that could potentially clarify some aspects for the reader.

      (1) It seems from the model fit plots (Figure 3) that the RT predictions of the model tend to overshoot in cases where one of the clouds is very easy. Could you include potential interpretations of this effect?

      We assume the reviewer is examining the Difference Model (i.e. the preferred model) panel when commenting on the overshoot. It is true the predictions for the highest coherence (bottom purple line) for RT is above the data but it is barely outside the data errorbars of 1 s.e. To be honest we regard this as a pretty good fit and would not want to over-interpret this small mismatch.

      (2) On page 4, around line 121, the study discusses the ”criss-crossing” effect in the RT data. You mention that the fact that RTs are long in hard-hard trials compared to easy-easy trials could be important here: ”These tendencies lead to a criss-cross pattern..”. It is confusing since, for instance, the race model does not have a criss-cross, but still exhibits the overall effect. I was intrigued bythe criss-crossing, and after some quick simulations, I found that the equation RT2 ∗ = 2 − 2 ∗ Cs12 − Cs22 + 6 ∗ (Cs1 ∗ Cs2)2 can (very roughly) replicate Figure 1d (bottom panel), so it seems that the criss-crossing effect must be produced by some interactive effect of color strengths on RTs. I wonder if you could provide a better explanation of how this interactive effect is generated by the model, given that it is the main interesting finding in the data. I believe at this point the intuition is not well-outlined.

      The criss cross arises through an interaction of the coherences as the reviewer suspects. That is, for the Difference model the RT related to abs(|Coh1|- |Coh2|). If we replace the first abs with a square we get

      |coh1|2 + |coh2|2 − 2|coh1||coh2|

      The larger this is, the smaller the RT so

      RT = constant − coh12 − coh22 + 2|coh1||coh2|

      which is very similar to the formula the reviewer mentions.

      We now supply an intuition as to why the criss-cross arises in the Difference model (line 167). We do not get a criss-cross in the race model, because there the RT is determined by the Race that that reaches a bound first. Because the races are independent, RTs will be fastest when coherence is high for either stimuli.

      (3) Am I wrong in my intuition that the two-step model would produce very similar predictions as the Difference model for Experiment 2? It would be great to discuss that either way since the twostep model seems to produce very close quantitative and pretty much the same qualitative fit to the data of Experiment 1.

      In Experiment 2 the participant is explicitly informed about the color dominance of both stimuli. Therefore, assuming the two-step model skips the first step and uses this explicit information in the second step, the difference and two-step model are identical for modeling Experiment 2. We explain this now on line 277.

      (4) The inclusion of the optimal model is great. It would be beneficial to provide some more connections to the rest of the paper here. Would this model produce similar predictions for Experiment 2, for instance?

      We now include the optimal model for the known color dominance RT experiment (line 420). We have also added a new paragraph in the Discussion on the optimal model at line 503 comparing it to the Difference model.

      (5) In the Methods, it is quite striking that out of 51 original participants, most were excluded and only 20 were studied. It is not easy to trace through this section why and how and who was excluded, so it would be great if this information was organized and presented more clearly.

      We have clarified this in the Methods at line 651 as a new subsection in the Methods. We also explain that exclusion was not made on RT data which is our main focus in the models.

      Reviewer #2 (Recommendations For The Authors):

      • As detailed in the ’public review’, a more cautious discussion, notably delineating the limitations of the study would be appreciated.

      • In their models, the authors assume that participants sequentially allocate attention between the two stimuli, alternating between them. Did the authors test this assumption and did they consider the possibility that participants could sample from both stimuli in parallel? In particular, does the conclusion of the model comparison also holds under this parallel processing assumption?

      Our results are not affected by whether participants sample the stimulus sequentially through alternation or in a parallel manner (Kang et al., 2021). What does change is the parameters of the model (but not their predictions/fits). In the parallel model, information is acquired at twice the rate of the serial model. We can, therefore, obtain the parameters of parallel models (that has serial and parallel models): 𝜅𝑝 = 𝜅𝑠/√2, 𝑢𝑝 = 𝑢𝑠√2, 𝑎𝑝 = 𝑎𝑠/2 and 𝑑𝑝 = 2𝑑𝑠 (Eq. 2). We now explain𝑠 𝑝 identical predictions to the serial model) directly from the parameters of the current sequential models simply by adjusting the parameters that depend on the time scale (subscripts and for this on line 518.

      • I found the small paragraph corresponding to lines 193-196 particularly difficult to understand. If the authors could think of a better way to phrase their claim, it would probably help.

      We have rewritten this paragraph at line 211

      • I found a type on line 122: ”wheres” instead of ”whereas”.

      Corrected

      • I found a type on line 181: ”or” instead of ”of”.

      Yes corrected

      • Figure #2 is extremely useful in understanding the models and their differences, make sure it remains after addressing the reviews!

      Thank you, this figure is retained.

      Reviewer #3 (Recommendations For The Authors):

      All comments are detailed in the public review, with some clarifications here:

      1) The confusing instructions to the participants are detailed here: under ”overview of experimental tasks” in the methods it says: ”They were instructed... to indicate whether the left or right stimulus was the easier one” (line 520), and below it ”they were required to indicate which patch had the stronger color dominance...” (line 524).

      We have clarified the instructions by providing the actual text displayed to participants in the methods and have ensured consistency in the method to talk about judging the easier stimulus (line 604).

      The instructions were “Your task is to judge which patch has a stronger majority of yellow or blue dots. In other words: For which patch do you find it easier to decide what the dominant color is? It does not matter what the dominant color of the easier patch is (i.e., whether it is yellow or blue). All that matters is whether the left or right patch is easier to decide”.

      2) Minor comments: Line 76: ”that” should be ”than”.

      Thanks, corrected

      Line 574: ”variable duration task” means ”controlled duration task”?

      Yes, corrected

      Line 151: ”or” should be ”of”.

      Corrected

    1. Author Response

      We appreciate the opportunity to publish our research in eLife. Both reviewers highlight our state-of-the-art oxygen isotope sampling approach, which has allowed us to establish that early-formed primate enamel does not show a large or consistent isotopic offset due to intensive nursing. This means we can be more confident in employing early-forming teeth to probe environmental conditions—an issue that has handicapped past paleoenvironmental studies—documenting seasonal rainfall variation in the tropics at an extremely fine-scale.

      Reviewer 1 requests that we elaborate on the ecology and behavior of orangutans, particularly in reference to the issue of isotopic enrichment within forest canopies—a topic we devote a paragraph to in the discussion. We appreciate the opportunity to add additional context during revision, noting here that our previous comparisons of terrestrial baboons and semi-terrestrial tantalus monkeys in the Bushenyi District (Uganda) do show modest isotopic differences between species, consistent with a canopy effect (Green et al. 2022). However, this is less of an issue for comparisons of Sumatran and Bornean orangutans given their ecological and behavioral similarities. We agree that variation in the canopy heights/positions of orangutan food sources may contribute to enamel oxygen isotope variation, in addition to the seasonal rainfall trends we observe in our datasets. Importantly, our published and on-going work on western chimpanzees has revealed strong annual oxygen isotope trends concordant with local rainfall patterns. The consistency and amplitude of seasonal oxygen isotope oscillations in such datasets suggest that arboreal primates are not less useful than terrestrial primates for reconstruction of rainfall seasonality.

      We clarify that while Reviewer 1 states that we measured 6 teeth, Tables 1 and 2 and the first sentence of the results make it clear that we measured 18 teeth in this study.

      Reviewer 2 asks for further detail about comparisons between modern and fossil orangutan teeth that support inferences of climate variation, which we will endeavour to add in the revised manuscript.

    1. Author Response

      We thank Editors and Reviewers for their positive evaluation of our work and appreciation of new findings and applied interdisciplinary approaches. We also thank for pointing out manuscript weaknesses as well as for all suggestions and advices that can strengthen this manuscript. We apologise for mistakes, overstatements or discrepancies in citing figures as well as omitted references.

      The first part of the manuscript focuses on the Tetrahymena RSP3 genes mutants.  Tetrahymena genome encodes three RSP3 paralogs that are the components of different radial spokes and likely form homo- and heterodimers. Thus, the proteomic analyses of Tetrahymena radial spokes are more complicated compared to the similar analyses in organisms having a single RSP3 protein.

      Next, we attempted to identify proteins specific for each RS type. Conducting this research, we took advantage of six different radial spoke knockout mutants (RSP3A-KO, RSP3B-KO, RSP3C-KO, CFAP206-KO, CFAP61-KO, and CFAP91-KO) and compared wild-type and mutants’ ciliomes using two methods, LFQ and TMT (for each mutant the experiment was repeated three times). Comparative analyses of the wild-type and mutants ciliomes allowed us to identify Tetrahymena radial spoke proteins, in the case of RS1 (WT versus RSP3A-KO), RS2 (WT versus RSP3B-KO, RSP3C-KO, and CFAP206-KO), and RS3 (wild-type versus  CFAP61-KO and CFAP91-KO). The extensive proteomic analyses were combined with detailed bioinformatics studies and co-immunoprecipitation and BioID assays to verify the presence of identified proteins in RS complexes. 

      Importantly, in the case of RS1 and RS2 spokes, our findings are in agreement with data obtained for Chlamydomonas and mammalian radial spokes. Thus, it is very likely, that newly discovered RS1 and RS2 proteins as well as identified Tetrahymena RS3 proteins are also true RS subunits.

      As an outcome of this part, we propose a model of the RS protein composition in a ciliate Tetrahymena. We agree that this model requires further experimental verification (for example by pull-down experiments).  However, considering the number of identified proteins, this is a considerable amount of additional work that we would like to publish as separate papers. We would like to add that our current analyses of additional RS3 mutant (that will be published separately) support findings regarding RS3 proteomic composition.

      Reviewer 2:

      The control for the bio-ID experiment was WT cells. Since there are many hits in the experiment, a better control would have been a strain with free BirA, or BirA fused to a protein that is distant from the radial spokes, such as one of the outer-dynein arm proteins, or a ciliary membrane protein.

      The BirA* tag is approximately 30 kDa protein and thus it can be transported to cilia by diffusion. BirA* ligase present throughout the cilia could randomly biotinylate proteins including radial spoke proteins. Thus, expression of the BirA* alone is not the best control. We have performed numerous BioID experiments in which BirA* tag was fused with T/TH subunits (CFAP43, CFAP44, Urbanska et al., 2018), subunits of the small complex positioned parallel to N-DRC (CCDC113, CCDC96, Bazan et al., 2021), CFAP69, SPEF2A (C1b central apparatus complex, Joachimiak et al., 2021), N-DRC proteins (Ghanaeian et al., Biorxiv, 2023) and subunits of other ciliary complexes (our unpublished data). The comparison of the earlier obtained BioID data with RSP BioID data, prove that identified proteins are specifically associated with radial spokes. Therefore, in our model, wild-type cells are a good control for BioID experiments.

    1. Author Response

      Reviewer 1 Public Review

      The authors aim to theoretically explain the wide range of time scales observed in cortical circuits in the brain – a fundamental problem in theoretical neuroscience. They propose that the variety of time scales arises in recurrent neural networks with heterogeneous units that represent neuronal assemblies of different sizes that transition through sequences of high- and low-activity metastable states. When transitions are driven by intrinsically generated noise, the heterogeneity leads to a wide range of escape times (and hence time scales) across units. As a mathematically tractable model, they consider a recurrent network of heterogeneous bistable rate units in the chaotic regime. The model is an extension of the previous model by Stern et al (Phys. Rev. E, 2014) to the case of heterogeneous self-coupling parameters. Biologically, this heterogeneous parameter is interpreted as different assembly sizes. The chaoticity acts as intrinsically generated noise-driving transitions between bistable states with escape times that are indeed widely distributed because of the heterogeneity. The distribution is successfully fitted to experimental data. Using previous dynamic mean-field theory, the self-consistent auto-correlation function of the driving noise in the mean-field model is computed (I guess numerically). This leaves the theoretical problem of calculating escape times in the presence of colored noise, which is solved using the unified colored-noise approximation (UCNA). They find that the log of the correlation time of a given unit increases quadratically with the self-coupling strength of that unit, which nicely explains the distribution of time scales over several orders of magnitude. As a biologically plausible implementationof the theory, they consider a spiking neural network with clustered connectivity and heterogeneous cluster sizes (extension of the previous model by Mazzucato et al. J Neurosci 2015). Simulations of this model also exhibit a quadratic increase in the log dwell time with cluster size. Finally, the authors demonstrate that heterogeneous assemblies might be useful to differentially transmit different frequency components of a broadband stimulus through different assemblies because the assembly size modulates the gain.

      I found the paper conceptually interesting and original, especially the analytical part on estimating the mean escape times in the rate network using the idea of probe units and the UCNA. It is a nice demonstration of how chaotic activity serves as noise-driving metastable activity. Calculating the typical time scales of such metastable activity is a hard theoretical problem, for which the authors made considerable advancement. The conclusions of this paper are mostly well supportedby simulations and mathematical analysis, but some aspects need to be clarified and extended, especially concerning the biological plausibility of the rate network model and its relation to the spiking neural network model as well as the analytical calculation of the mean dwell time.

      Question 1a. The theory is based on a somewhat unbiological network of bistable rate units. It seems to only loosely apply to the implementation with a spiking neural network with clustered architecture, which is used as a biological justification of the rate model. In the spiking model, a wide distribution of time scales also emerges as a consequence of noise-induced escapes in combination with heterogeneity. Apart from this analogy, however, the mechanisms for metastability seem to be quite different: firstly, the functional units in the spiking neural network are presumably not bistable themselves but multistability only emerges as a network effect, i.e. from the interaction with other assemblies and inhibitory neurons. (This difference yields anti-correlations between assemblies in the spiking model, in marked contrast to the independence of bistable rate units (if N is large).) Secondly, transitions between metastable states are presumably not driven by chaotic dynamics but by finite-size fluctuations (e.g. Litwin-Kumar and Doiron 2012). The latter is also strongly dependent on assembly size. More precisely, the mechanism of how assembly size shapes escape times T seems to be different: in the rate model the self-coupling ("assembly size") predominantly affects the effective potential, whereas in the spiking network, the assembly size predominantly affects the noise. Therefore, the correspondence between the rate model and the spiking model should probably be regarded in a looser sense than presented in the paper.

      Answer 1a. We thank the Reviewer for suggesting to clarify the relationship between the rate and spiking model. In this answer, we first show that the dynamicalmodes in the spiking network are E/I cluster pairs, then we show that assemblies are bistable due to the large self-couplings, and third we discuss whether transitions between high and low activity states are driven by chaos or finite size effects, including correlations between assemblies.

      We first elucidated the dynamical modes in the spiking network and how those can be related to the rate network. Using an approach from (1, 2), we considered the mean-field theory for the spiking network, reducing the degrees of freedom from N neurons to 2p+2 E/I assemblies (plus E/I background populations), then we identified the approximate dynamical modes as E/I cluster pairs emerging as the Schur eigenvectors of the mean field-reduced coupling matrix. Comparing the eigenvalue distribution of the full vs. the mean field-reduced coupling matrix, we found that the slow timescales capturing the assemblies metastable dynamics correspond to the p−1 large positive eigenvalues corresponding to the Schur modes. The heterogeneity in timescales of the spiking model arises from the heterogeneous distribution of these gapped eigenvalues, reflecting the hierarchy in assembly sizes and assembly self-couplings in the mean field approach. We then analyzed the eigenvalues in the rate network with a lognormal self-coupling distribution and found a similar picture, where the slow units are related to the large eigenvalues in the coupling matrix (Appendix 2). We also note that in the rate network, there is no gap in the eigenvaluedistributionas there are many units with small values of the self-couplings. On the other hand in the spiking network the large eigenvalues are p − 1, where p is the number of assemblies, and they are gapped. These new analyses clarify the correspondence between rate network units and spiking network E/I cluster pairs, arising from the Schur picture.

      We now discuss previous studies to examine whether bistability in the spiking network arises from assembly self-couplings or from other effects. Previous mean-field analyses of spiking networks with clustered connectivity showed that the bistability of assembly dynamics is due to the presence of a large self-coupling, rather than from the interactions with other assemblies. We briefly summarize the published evidence for this. The seminal work of (3) showed that in a network with assemblies, a bifurcation in network dynamics emerges when the assembly self-coupling JEE+ > Jc exceeds a critical value Jc; beyond this value, a low and a high activity stable state coexist. Although in this network these two states are stable, more recent work from (4, 5) showed that finite size effects (small assembly size) can destabilize the states, leading to the metastable regime. When the inhibitory population is homogeneous, as in these last two articles, metastability arises from finite size effects and it is sensitive to network parameters (5) and (6). Specifically, when one scales both the network size and the E assembly size, metastability disappears (5). Moreover, when the I population is homogeneous, then E clusters are anti-correlated, as correctly suggested by the Reviewer. However, our model differs from the ones just discussed in that the inhibitory population is also arranged in assemblies, which are reciprocally paired with E assemblies. In this class of E/I clustered models, metastability is robust to changes in network parameters (see (6)). More specifically, in our revised version, we show that metastable dynamics persists when scaling up the network size to N = 10,000 neurons (and scaling up network size with N). A crucial difference between the model with homogeneous I population vs the model with I assemblies (i.e., our model), is that in the former the assemblies are anti-correlated, while in the latter case the assemblies are uncorrelated (see Fig. 1), the same as in the rate network. These results suggest that transitions between metastable states in the spiking network may be driven by a coexistence of two effects: on the one hand, finite size effects due to the small assembly size, and on the other hand, by the heterogeneity in the inter-assembly couplings. Although the former effect is absent in the rate network, the latter is the driver of the chaotic activity observed in the rate network. Thus it is plausible that rate-based chaotic dynamics might also contribute to the metastable activity in the spiking network, although more targeted work should be performed to answer this question. In the revised version of the manuscript, we overhauled the subsection ’A reservoir of timescales in E-I spiking networks’, Fig. 5, and Appendix 2, by adding an extensive explanation of the emergence of slow timescales from the large eigenvalues in the Schur basis, and its comparison between spiking and rate network. In particular, we highlighted the differences between rate and spiking networks and the fact that finite size effects might be at play in the latter case.

      Furthermore, the prediction of the rate model is a quadratic increase of log(T), however, the data shown in Fig.5b do not seem to strongly support this prediction. More details and evidence that the data "was best fit with a quadratic polynomial" would be necessary to test the theoretical prediction.

      We increased the clarity and strengthened the support for the data in Fig 5b as "best fit with a quadratic polynomial" by addinga plot, inset in Fig 5b, alongsidea detailed explanation of the fitting procedure in Methods section (e). Figure 5b inset displays a cross-validatedmodel selection’s training and test errors for polynomial fit. The test error shows a minimal error at a polynomial degree 2, supporting the claim that the best fit was achieved with a quadratic polynomial. In Methods section (e), under "Model selection for timescale fit," we added a detailed description of the cross-validation procedure by which the fit was obtained. A quote from that section in the revised manuscript can also be found in this document under answer 11.

      Question 2. The time scale of a bistable probe unit driven by network-generated "noise" is taken to be the mean dwell time T (mean escape time) in a metastable state. It seems that the expressions Eq.4 and Eq.21 for this time are incorrect. The mean dwell time is given by the mean first-passage time (MFPT) from one potential minimum to the opposite one includingthe full passage across the barrier. At least, the final point for the MFPT should be significantly beyond the barrier to complete the escape. However, the authors only compute the MFPT to a location −xc slightly before the barrier is reached, at which point the probe unit has not managed to escape yet (e.g. it could go back to −x2 after reaching −xc instead of further going to +x2). It is not clear whether the UCNA can be applied to such escape problems because it is valid only in regions, where the potential is convex, and thus the UCNA may break down near the potential barrier. Indeed, the effective potential is not defined near the barrier (see forbidden zone in Fig.4b), and hence it is not clear how to calculate the mean escape time. Nonetheless, the incomplete MFPT computedby the authors seems to qualitatively predict the dependence on the self-coupling parameter s, at least in the example of Fig.4c. However, if the incomplete MFPT is taken as a basis, then the incomplete MFPT should also be used for the white-noise case for a fair comparison. It seems that the corresponding white-noise case is given by Eq.4 with τ1 = 0, which still has the same dependence on the self-coupling s2, contrary to what is claimed in the paper (it is unclear how the curve for the white-noise case in Fig.4 was obtained). Note that the UCNA has been designed such that it is valid for both small and large τ1 (thus, it is also unclear why the assumption of large τ1 is needed).

      Answer 2. We are deeply grateful to the Reviewer for this critical evaluation of our UCNA calculation of the escape times. We will first clarify our rationale and then discuss comparison with the white noise case. The idea behind our calculation is indeed that when starting from the left minimum −x2, the probe effectively escapes to +x2 before reaching the limit of the UCNA support region at −xc. First, our simulations show (Fig 4b light blue colored area) that the probe almost exclusively visits the valid areas |x| > xc: our new analysis shows that the fraction of activity spent in the forbidden region is (1.8+/ −0.4)×10−3 (mean±SD over 10 probe units run with parameters as in Fig. 4a-b), confirming the fact that the histogram of x values from simulations has almost null support in the forbidden region |x| < xc. This is also supported by the representative simulation time course in Fig. 4a which exhibits abrupt jumps between the two bistable states. We then estimated the ’escape point’ from simulations as follows: for a transition from the x = −s2 well towards the x = +x2 well, the escape point is defined as the point where x on the side of the source well, i.e. x < 0, but the trajectory starts accelerating towards the target well (positive second derivative). We found that the distribution of escape points was predominantly in the allowed region (93.8%). This analysis supports our method to calculate the MFPT and confirms that our calculation is performed in the valid UCNA region. In the revised version of the manuscript, we added a clarification of this point with text and a new supplementary figure in Fig. 4 Suppl. 1. Regarding the comparison with white noise, we compared white-noise-driven probe dynamics with a probe driven by a network (effectively represented by the colored noise). To adequately make this comparison, we replaced the input coming from the network into the probe unit (Eq 1. rhs last term) with white noise. The rest of the terms in this equation were left untouched to maintain the exact probe’s self-response properties. This procedure aims to understand the unique contribution of the colored noise generated by the network to each unit dynamics by removing its "colored" correlated input contribution but otherwise leaving all dynamical properties the same. For clarity of the manuscript on this subject, we added a paragraph about it under "A comparison with white noise" in Methods section (d).

      We can estimate the mean first passage time (MFPT) of a probe unit driven by white noise with Eq. 4. The procedure described above for switching the network drive with white noise also dictates the parameter values to use in Eq. 4 for the case of white noise. First, with no correlation in white noise τ1 = 0. Second, D, the magnitude of the drive inherits its value from the network (see also Eq. 22) as the strength of the white noise (its integral around zero as a δ function). The results are presented in Fig 4. To strengthen the results and improve the clarity of the text, we expanded the content of Fig 4c. The plot now includes both the results of simulations (Fig. 4c green line) and estimation by mean first passage time (Fig. 4c green dashed line) for white noise, as explained above. We note that the potential in the white noise case (Fig. 4b green dashed line) does include a concave part. Indeed,the agreementbetweenthe distributionretrieved from simulations (Fig. 4b light green area) and its locations’ visit probability approximated by theory (Eq. 19 with τ1 = 0, Fig. 4b green line) are not in full agreement. However, this probability is still a good approximation. As a result, the mean first passage time (Eq. 4, Fig. 4c green dashed line) is a good approximation. The great advantage of having Eq. 4 as an approximation for the mean first passage time is that it clearly explainsthe contributionof each part of the dynamical equation (Eq. 1) towards achieving long timescales. Mainly, since log<T> depends on τ1 linearly, its exponent, the mean first passage time depends on tau1 exponentially. Hence the importance of the color in the input and the vast differences between the network drive and the white noise.

      Question 3. The given argument that the time-scale separation arises as network effect is not very clear. Apart from the issue of a fair comparison of colored and white noise raised in point 1 above, an external colored noise with matched statistics that drives a single bistable unit would yield the same MFPT and thus would be an alternative explanation independent of the network dynamics.

      Answer 3. The goal of our investigation was to uncover a neural mechanism that induces heterogeneous timescales in a self-consistent way. The idea of self-consistencyis the central tenet of our approach, namely, that a timescale distribution must arise due to the internal dynamics of a recurrent circuit without the need to invoke an external auxiliary force driving it. If we had an external colored noise with matched statistics driving the probe unit, then we would still have to explain which mechanism would give rise to that particular statistics of the colored noise - with the most natural explanation being a recurrent network with time-varying activity.

      The second ingredient in our argument demonstrating that it is a network effect is the following. If the time-scale separation was not a network effect, but rather a property of a single probe unit, then it would persist regardless of the specific features of the noise driving the unit. To test this hypothesis, we compared the scenarios of the same probe unit driven by the self-consistent noise generated by the rest of the network, as opposed to white noise, and found that the time-scale separation is not present in the second case. Thus, the time-scale separation is not an intrinsic property of the probe unit, but, rather, it relies on the unit being part of a recurrent network generating a specific kind of noise. This argument is explained in the last paragraph of the section ’Separation of timescales in the bistable chaotic regime’.

      Question 4. The UCNA has assumptions and regimes of validity that are not stated in the paper. In particular, it assumes an Ornstein-Uhlenbeck noise, which has an exponential auto-correlation function, and local stability (region where potential is convex). Because the self-consistent auto-correlation function is generally not exponential and the probe unit also visits regions where the potential is concave, the validity of the UCNA is not clear. On the other hand, the assumption of large correlation time might be dropped as the UCNA’s main feature is that it works for both large and small correlation times.

      Answer 4. We thanks the Reviewer again for this critical evaluation of our assumptions, however, we believe that our approach is justified because of the following two arguments. First, although the UCNA was derived in case of an OU process, it has since then been successfully applied to different classes of noise, including multiplicative noise, harmonic noise, and others (see e.g. (7–9). To the best of our knowledge, the UCNA has never been applied before to noise whose autocorrelation arises from chaotic dynamics, whose hallmark is a vanishing slope at zero lag, markedly different from the OU process. To address the concern about concavity, we performed the additional analyses discussed in our answer to Question 2, showing that the probe unit never visits regions where the potential is concave, which would lie outside of the support of the potential. Because of these two considerations, we believe that the UCNA is valid in our scenario, as suggested by the good agreement between theory and simulation at large values of the self-couplingsin Fig. 4c. Finally, we thank the Reviewer for bringing up the fact that UCNA works for both large and small correlation times, we fixed that in the revised manuscript.

    1. Author Response

      Reviewer #2 (Public Review):

      The work is very clearly designed, executed, and written. The transcription output data is rigorous and well quantified, and the fit of the TF binding model clearly shows agreement with experiments in the case of cooperativity, but not in its absence, making a strong case for the authors' conclusion.

      How the Hidden Markov Model fit results (promoter kon and koff values) lead to the observed effects on transcription output is less clear. For instance, Dl1 deletion results in a small increase in kon and a moderate increase in koff, which seems at odds with the other variants. Yet all variants exhibit similar transcription output profiles. One other intriguing observation is that the promoter states in Fig. 4C&D do not look dramatically different in their kinetics, yet the input transcription traces exhibit a 3-fold amplitude difference. Maybe the authors can clarify these apparent discrepancies.

      We thank the reviewer for insightful comments. The reduction in transcription output is mainly due to the decrease in transcription amplitude. We have done further analysis to demonstrate that the loading rate of Pol II, correlated to the initial slope of transcription, is significantly reduced in the mutants. We measured the initiation rate by calculating the slope of the MS2 traces and correlated it to the Pol II loading rate. As expected, the initiation rate in wildtype is higher than in mutant embryos. This additional analysis suggests that the drastic reduction in transcriptional amplitude is due to the reduced Pol II loading rate, not kon, and corroborates the previously shown results and conclusions (Bothma et al., PNAS 2014, PMID: 24994903; Garcia et al., Curr. Biol. 2013, PMID: 24139738). We have added this plot in Figure 4H in the revised manuscript, which shows the initiation rates of the wildtype and mutant embryos, and revised the manuscript as follows.

      We have added this in the Introduction (Page 4):

      We find that mutating a single TF (Dl or Twi) binding site in the enhancer significantly reduces mRNA production of the target gene, mainly through lowering transcriptional amplitude by reducing RNA polymerase (Pol) II loading rate, without significantly delaying the timing of initiation or affecting the probability of activation.

      We have added this in the Results (Page 15):

      Previously, we demonstrated that the mutations affect mRNA production through transcriptional amplitude (Figure 2E). This could be because either the mutations hinder the Pol II loading rate or reduce the time the promoter is in the ON state….

      In addition, we find that the Pol II loading rate is significantly reduced in the mutant embryos compared to the wildtype (Figure 4H). This confirms that the lower transcriptional amplitude mainly results from the promoter’s inability to effectively load Pol II, along with an additional contribution from the reduced time the promoter spends in the ON state.

      We have added this in the Discussion (Page 16):

      This reduction is mainly due to the decreased transcriptional amplitude, driven by a lower rate of Pol II loading… and, Since the amount of time the promoter spends in the ON state is not affected by the mutations, the lower transcriptional amplitude can be mainly attributed to the promoter’s inability to effectively load Pol II (Figure 2E, Figure 4D-F).

      The HMM is utilized to tease apart the changes in transcriptional kinetics. Our analysis revealed that the HMM provides some explanation for the reduction in transcriptional output in TF binding site mutants. For this reason, we must examine the results in a broader context. As pointed out, Dl1 site deletion has a slightly different effect on kon and koff. However, its transcription output is similar to the other mutants (Figure 4D and E). This is due to the fact that the changes in kon and koff are significantly less drastic than the changes in the transcription amplitude and Pol II loading rates, contributing less to the mRNA production. In our analysis, the amplitude is a separate parameter than the kon and koff rates, which are calculated from the HMM.

      We have added the following in the Discussion to address this concern (Page 17):

      However, we note that the HMM only provides some explanation for the reduction in transcriptional activity since the changes in kon and koff are less drastic than the changes in transcriptional output. Since the amount of time the promoter spends in the ON state is not affected by the mutations, the lower transcriptional amplitude can be mainly attributed to the promoter’s inability to effectively load Pol II (Figure 2E, Figure 4D, H).

      The authors observe cooperativity between TF binding sites and transcription output, which their model suggests is driven by TF binding cooperativity ("We propose that the cooperativity allows TF binding sites with moderate or weak affinities to recruit more TFs to the enhancer"). This is plausible and likely, but not rigorously demonstrated; another possibility could be cooperativity at the step of transcription activation. One could verify that the binding step is the cooperative one via ChIP-qPCR in the different variants, but given the cautious wording of the paper, this is not absolutely necessary.

      We thank the reviewer for suggesting this experiment. Unfortunately, due to the experimental design, performing ChIP-qPCR was not feasible. There are two copies of snaSEmin enhancer region, one within the endogenous genome and the one within the transgene. For this reason, proper amplification in qPCR was challenging as the primers would recognize two distinct portions of the genome. We designed primers such that the forward primer would recognize both the endogenous and transgene enhancer region (inevitable) and the reverse primer would recognize only the transgene. Yet, we did not observe the expected fold change in amplification as the concentration of DNA was modulated. Hence, we did not proceed to perform ChIPqPCR.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated plausible circuit mechanisms for their recently reported effect of NMDAR antagonists on the synchrony of prefrontal neurons in a cognitive task. On the basis of previously proposed computational network models of spiking excitatory and inhibitory neurons and their mean-field and linear stability analysis descriptions, they show that a specific network configuration set close to the onset of instability of the asynchronous state can replicate qualitatively key empirical observations. For such a network, a small increase in external drive causes a large increase in neuronal synchrony, and this is not happening if NMDAR-dependent transmission is reduced. This shows parallelism with the empirical data thus representing its first neural network explanation.

      The paper provides valuable insights into possible mechanisms related to cortical dysfunction under NMDAR hypofunction, a topic of importance for several neuropsychiatric disorders. However, the fact that the manuscript remains at a rather abstract level and does not attempt a closer match to the experimental data is a limitation of the study.

      1) The manuscript is strongly based on state diagrams and parametric descriptions of neural dynamics in a computational model that has been extensively studied before (Brunel, Wang 2003). Many of the parametric dependencies of this model shown here were already reported before, although not specifically altering concurrently external inputs and NMDAR-dependent transmission as done now. The main novelty of the study is the application of this framework to a specific empirical dataset of great scientific relevance. However, the manuscript emphasizes the model exploration in relation to a limited set of effects in the data (changes in synchrony immediately before motor response) and not so much the comparison to the neural recordings more generally (for instance, firing rates, other time periods in the task, etc.)

      We are grateful to the Reviewer for thoroughly reviewing the manuscript and the constructive critique. Our work is built on the computational framework that has been developed earlier in several seminal computational and theoretical studies, including Compte et al. (2000) and Brunel and Wang (2003), that we acknowledge throughout our paper. However, we would like to emphasize, without diminishing the importance of these earlier studies, that our work provides new theoretical and computational insights on the impact of NMDAR synaptic transmission modulation on spiking dynamics by further developing the theoretical framework of Brunel and Wang (2003). For example, in Brunel and Wang (2003) it is stated that “NMDA conductances could be removed from all simulations without affecting any of the results” (p. 416). In fact, equations provided in Brunel and Wang (2003) are only for the special case of the oscillatory instability growth rate λ=0 and they do not include the NMDAR synaptic conductances. Thus, the consideration presented in Brunel and Wang (2003) cannot explain the NMDAR-dependent modulation of synchrony effect observed in Zick et al. (2018). In our study, we extended the theoretical framework of Brunel and Wang (2003) and provided equations that explicitly include both λ and NMDAR conductance. It is this extension of the framework that allowed us to provide an NMDAR dependent mechanism to explain the Zick et al. (2018) effect.

      In the revised manuscript, by suggestion of Reviewer 2, we have further extended theoretical consideration and obtained an analytic approximation in closed form for the oscillatory instability growth rate λ describing the dependence on the AMAPR, NMDAR, GABAR synaptic conductances and external rate. We believe that this is the first paper in which such approximation for the instability growth rate λ accounting for the effects of more realistic synaptic currents is obtained. Based on this consideration, we have now provided in a new Results section “Dependence of oscillatory instability growth rate on synaptic parameters” a substantially more detailed theoretical account of the precise mechanism implemented in our model for the transition between the steady and oscillatory states and the lack thereof when the NMDAR conductance is blocked.

      We agree with the reviewer that it would be beneficial for the paper to extend the model exploration in relation to other measurable variables provided by neural data such us firing rates. At the reviewers’ suggestions we have now carried out new series of simulations with transient external inputs and compared the simulation results with the dynamics of both synchrony and firing rates that were estimated from neural data. We address these questions in more detail in the corresponding points in the Recommendations for the authors section below.

      2) As discussed in the introduction, empirical data available suggests that 0-lag synchrony in prefrontal networks is affected by manipulations that reduce NMDAR function (Zick et al. 2018) and by manipulations that enhance NMDAR function (Zick et al. 2021). The computational model presented in this manuscript does not show this U-shaped behavior and the discussion does not mention this. It should be discussed whether the model can accommodate this or not.

      This is a very good point which we now explicitly address in a new section in the revised Discussion (‘Potential U-shaped relation between NMDAR function and spike synchrony’, see new text in blue starting at line 953). The reviewer provides an excellent insight by noting that that our prior neural recording data (specifically convergent reduction in 0-lag synchrony in monkey drug and mouse genetic models) could be explained by an inverted U-shaped relationship between NMDAR function and 0-lag synchrony. In the new section we also note the precedent for such a relationship by drawing a parallel to the work of Vijayraghavan, Arnsten and colleagues (2007) showing an inverted U-shaped relationship between D1R synaptic actions and the strength of persistent activity in monkey prefrontal neurons during working memory tasks.

      However, in the new section we note also that we cannot yet conclude that the relationship between 0-lag spike synchrony and NMDAR activation is indeed an inverted U-shaped function based on our neural data. Reaching this conclusion would require completing a dose-response function between the concentration of NMDAR agonist (or antagonist) administered and the strength of 0-lag synchrony (which we have not done). In addition, we note in the new section that we can’t conclude the reduction of 0-lag synchrony in mouse prefrontal cortex is indeed due to increased expression of NMDAR, since deletion of Dgcr8, given its role in miRNA synthesis, would be expected to upregulate the expression of many different mRNA corresponding to many different genes. However, the possibility of a U-shaped relation is an important and interesting one, which we now fully discuss.

      Reviewer #2 (Public Review):

      In this paper, the authors carry out neural circuit modeling to theoretically elucidate the mechanism underlying the empirically observed (in a previous study by some of the current authors) reduction in neural synchrony in the monkey prefrontal cortex (PFC), as a result of NMDAR blockade. Empirically it was previously found that in monkeys performing a cognitive control task, PFC neurons exhibit precisely timed synchronous firing, especially in the short period before the monkey's response, leading to "0-lag" (zero in the 1-2 millisecond timescale) spiking correlations. This signature of synchrony was then found to be extinguished or diminished with the systemic administration of an NMDAR antagonist.

      In the current study, the authors simulate and analyze a network of excitatory and inhibitory spiking neurons as a model of a local PFC circuit, to elucidate the mechanism underlying this effect. The model network is composed of leaky integrate-and-fire neurons with conductance-based synaptic inputs and is sparsely and randomly connected as in the classic studies of balanced networks in which neurons fire irregularly as observed in the cortex. Using mean-field theory, the authors start by mapping out the phase boundary between the asynchronous irregular and synchronous irregular states in the network as a function of network parameters controlling synaptic connectivity and external background inputs (which they parametrize as ratios of recurrent or external currents mediated by AMPAR, NMDAR or GABAA). The transition between the two phases corresponds to a Hopf-like bifurcation above which synchronous oscillations with frequency in the gamma-band (or above) emerge. It is found that with an increase in external inputs, a network in the asynchronous state (but close to criticality) can switch to the synchronous state. Based on this, the authors hypothesize that an increase in the external drive is the mechanism underlying the empirically observed increase in synchrony before the behavioral response. It is then shown that a reduction in NMDAR conductance (keeping AMPAR or GABAR conductances fixed) has the opposite effect, and pushes the network towards the asynchronous state, and can counteract or weaken the effect of increased external input. In both cases increase or decrease in synchrony is quantified by an increase or decrease in 0-lag pairwise correlations; transition to synchrony is shown to also lead to the development of nonzero-lag peaks in the average spiking correlation reflecting gamma-band oscillations. The authors then show that (with the appropriate choice of primary network parameters) their proposed mechanisms for the (natural) increase in synchrony via an increase in external inputs and the weakening of this effect with the weakening of NMDA conductances do semi-quantitatively match the observed changes in 0-lag synchrony and nonzero lag peaks in spiking correlations. Finally, they discuss the effect of the balance between average NMDA and GABA currents in the primary (baseline) network on the above effects.

      Strengths:

      • The modeling and analysis are solid and overall this work succeeds in providing a convincing mechanistic explanation for the specific empirically observed effects in monkey PFC: the natural task-dependent modulation of 0-lag synchrony and its extinction with NMDA blockage.

      • The manuscript is very readable and the figures and plots are clearly described.

      • The mathematical mean-field analysis in the Methods section is also sound and well written and does/can (see below) provide a sufficient mathematical explanation of the simulation results.

      We appreciate the positive comments.

      Weaknesses:

      1) I found the intuitive explanation of the effects of external input or NMDAR conductance on synchrony incomplete. While simulations and mean-field analysis both predict this effect, the mean-field theory and the linearization analysis and stability analysis can be used to further shed light on the precise mechanism by which external input and NMDAR conductance promote synchrony (or destabilization of the asynchronous state).

      2) An important natural question (which is relevant to the connection with schizophrenia) is what are the distinct roles of AMPAR-based and NMDAR-based excitation on the transition to synchrony, and this is not addressed in this study. It would be important to clarify what is special/distinct about NMDAR in the current findings.

      3) In the Introduction and Discussion, the authors speculate on the possible connection between their empirical and theoretical findings (on the effect of NMDAR hypofunction on synchronous spiking) and the pathogenesis of schizophrenia. While this is not central to the findings of the paper, because it is relevant to the broader significance and impact of this work I will note the following. Their proposed specific link to pathogenesis is as follows: the reduction in precisely timed synchrony resulting from NMDAR hypofunction can disrupt spike-timing dependent plasticity (STDP) and lead to "disconnection" of cortical circuits as observed in schizophrenia. Letting aside the fact that observations in schizophrenia relate to functional connectivity and not synaptic connectivity, previous theoretical studies of STDP in spiking networks do not support the claim that lack of synchronous activity would lead to disconnection of the circuit.

      Thank you for the thorough review and critique, bringing up these important issues. We address them in detail in the corresponding points in the Recommendations for the authors section below.

      Reviewer #3 (Public Review):

      The starting point of the paper is the observation by the group of Matthew Chafee that zero-lag correlations in pairs of prefrontal cortex neurons transiently increase close to the motor response in a dot-pattern expectancy task', and that this increase in synchrony is abolished by NMDA blockers. The goal of this paper is to understand the mechanisms of this NMDA-dependent increase in synchrony using computational modeling. They simulate and analyze a network of sparsely connected spiking neurons in which synaptic interactions are mediated by AMPA, NMDA, and GABA conductances with realistic time constants. In this network, it had been shown previously that when parameters are such that the network is close to a bifurcation separating asynchronous from synchronous oscillatory states, an increase in external inputs can push the network towards synchrony. They show that when the NMDA component of synaptic inputs is removed, the network moves away from the bifurcation, and thus the same increase in external inputs no longer leads to a significant increase in synchronization.

      Thus, this study provides a potential explanation for the NMDA-dependent increase of synchrony observed in their data. The authors further argue that this effect might be responsible for symptoms observed in schizophrenia, through spike-timing-dependent mechanisms. Overall, this is an interesting study, but there are several weaknesses that dampened my initial enthusiasm: In particular, the model predicts a tight link between synchrony and mean firing rate that should hold during the whole task, not only at the time of the motor response but this is not explored by the authors.

      Thank you for critically reviewing the manuscript and valuable comments. We address them in the corresponding points in the Recommendations for the authors section below.

      Also, the relationship between changes in synchrony due to NMDAR dysfunction and schizophrenia is not very convincing. Many forms of synaptic plasticity, including STDP are dependent on NMDA receptors, and thus synaptic plasticity in schizophrenic patients is likely to be impacted independently of any synchrony. Thus, the link between the results of this paper and schizophrenia seems tenuous.

      These are good points. To address them we have limited the link between the current study and schizophrenia in the Introduction to the motivation for the original neurophysiological experiments (as this link dictated the pharmacological and genetic manipulations we employed in the animal models). We have also added a new section to the Discussion with the heading ‘Spike timing disruptions and rewiring of prefrontal local circuits via STDP’ where we discuss the complexity of the interaction between STDP, synchrony, and connectivity in prior modeling studies. Namely, it is difficult to predict whether loss of synchronous spiking would cause disconnection via STDP without additional data. We acknowledge this constraint on our original hypothesis that asynchrony would cause disconnection considering these prior theoretical studies in this new section. In this section, we also note that altered NMDAR function that has been implicated in schizophrenia could impact STDP directly independently of any change in spike synchrony (see new blue text, starting at line 950) as suggested.

    1. Author Response

      Reviewer #1 (Public Review):

      Esmaily and colleagues report two experimental studies in which participants make simple perceptual decisions, either in isolation or in the context of a joint decision-making procedure. In this "social" condition, participants are paired with a partner (in fact, a computer), they learn the decision and confidence of the partner after making their own decision, and the joint decision is made on the basis of the most confident decision between the participant and the partner. The authors found that participants' confidence, response times, pupil dilation, and CPP (i.e. the increase of centro-parietal EEG over time during the decision process) are all affected by the overall confidence of the partner, which was manipulated across blocks in the experiments. They describe a computational model in which decisions result from a competition between two accumulators, and in which the confidence of the partner would be an input to the activity of both accumulators. This model qualitatively produced the variation in confidence and RTs across blocks.

      The major strength of this work is that it puts together many ingredients (behavioral data, pupil and EEG signals, computational analysis) to build a picture of how the confidence of a partner, in the context of joint decision-making, would influence our own decision process and confidence evaluations. Many of these effects are well described already in the literature, but putting them all together remains a challenge.

      We are grateful for this positive assessment.

      However, the construction is fragile in many places: the causal links between the different variables are not firmly established, and it is not clear how pupil and EEG signals mediate the effect of the partner's confidence on the participant's behavior.

      We have modified the language of the manuscript to avoid the implication of a causal link.

      Finally, one limitation of this setting is that the situation being studied is very specific, with a joint decision that is not the result of an agreement between partners, but the automatic selection of the most confident decisions. Thus, whether the phenomena of confidence matching also occurs outside of this very specific setting is unclear.

      We have now acknowledged this caveat in the discussion in line 485 to 504. The final paragraph of the discussion now reads as follows:

      “Finally, one limitation of our experimental setup is that the situation being studied is confined to the design choices made by the experimenters. These choices were made in order to operationalize the problem of social interaction within the psychophysics laboratory. For example, the joint decisions were not made through verbal agreement (Bahrami et al., 2010, 2012). Instead, following a number of previous works (Bang et al., 2017, 2020) joint decisions were automatically assigned to the most confident choice. In addition, the partner’s confidence and choice were random variables drawn from a distribution prespecified by the experimenter and therefore, by design, unresponsive to the participant’s behaviour. In this sense, one may argue that the interaction partner’s behaviour was not “natural” since they did not react to the participant's confidence communications (note however that the partner’s confidence and accuracy were not entirely random but matched carefully to the participant’s behavior prerecorded in the individual session). How much of the findings are specific to these experimental setting and whether the behavior observed here would transfer to real-life settings is an open question. For example, it is plausible that participants may show some behavioral reaction to a human partner’s response time variations since there is some evidence indicating that for binary choices such as those studied here, response times also systematically communicate uncertainty to others (Patel et al., 2012). Future studies could examine the degree to which the results might be paradigm-specific.”

      Reviewer #2 (Public Review):

      This study is impressive in several ways and will be of interest to behavioral and brain scientists working on diverse topics.

      First, from a theoretical point of view, it very convincingly integrates several lines of research (confidence, interpersonal alignment, psychophysical, and neural evidence accumulation) into a mechanistic computational framework that explains the existing data and makes novel predictions that can inspire further research. It is impressive to read that the corresponding model can account for rather non-intuitive findings, such as that information about high confidence by your collaborators means people are faster but not more accurate in their judgements.

      Second, from a methodical point of view, it combines several sophisticated approaches (psychophysical measurements, psychophysical and neural modelling, electrophysiological and pupil measurements) in a manner that draws on their complementary strengths and that is most compelling (but see further below for some open questions). The appeal of the study in that respect is that it combines these methods in creative ways that allow it to answer its specific questions in a much more convincing manner than if it had used just either of these approaches alone.

      Third, from a computational point of view, it proposes several interesting ways by which biologically realistic models of perceptual decision-making can incorporate socially communicated information about other's confidence, to explain and predict the effects of such interpersonal alignment on behavior, confidence, and neural measurements of the processes related to both. It is nice to see that explicit model comparison favor one of these ways (top-down driving inputs to the competing accumulators) over others that may a priori have seemed more plausible but mechanistically less interesting and impactful (e.g., effects on response boundaries, no-decision times, or evidence accumulation).

      Fourth, the manuscript is very well written and provides just the right amount of theoretical introduction and balanced discussion for the reader to understand the approach, the conclusions, and the strengths and limitations.

      Finally, the manuscript takes open science practices seriously and employed preregistration, a replication sample, and data sharing in line with good scientific practice.

      We are grateful to the reviewer for their positive assessment of our work.

      Having said all these positive things, there are some points where the manuscript is unclear or leaves some open questions. While the conclusions of the manuscript are not overstated, there are unclarities in the conceptual interpretation, the descriptions of the methods, some procedures of the methods themselves, and the interpretation of the results that make the reader wonder just how reliable and trustworthy some of the many findings are that together provide this integrated perspective.

      We hope that our modifications and revisions in response to the criticisms listed below will be satisfactory. To avoid redundancies, we have combined each numbered comment with the corresponding recommendation for the Authors.

      First, the study employs rather small sample sizes of N=12 and N=15 and some of the effects are rather weak (e.g., the non-significant CPP effects in study 1). This is somewhat ameliorated by the fact that a replication sample was used, but the robustness of the findings and their replicability in larger samples can be questioned.

      Our study brings together questions from two distinct fields of neuroscience: perceptual decision making and social neuroscience. Each of these two fields have their own traditions and practical common sense. Typically, studies in perceptual decision making employ a small number of extensively trained participants (approximately 6 to 10 individuals). Social neuroscience studies, on the other hand, recruit larger samples (often more than 20 participants) without extensive training protocols. We therefore needed to strike a balance in this trade-off between number of participants and number of data points (e.g. trials) obtained from each participant. Note, for example, that each of our participants underwent around 4000 training trials. Strikingly, our initial study (N=12) yielded robust results that showed the hypothesized effects nearly completely, supporting the adequacy of our power estimate. However, we decided to replicate the findings because, like the reviewer, we believe in the importance of adequate sampling. We increased our sample size to N=15 participants to enhance the reliability of our findings. However, we acknowledge the limitation of generalizing to larger samples, which we have now discussed in our revised manuscript and included a cautionary note regarding further generalizations.

      To complement our results and add a measure of their reliability, here we provide the results of a power analysis that we applied on the data from study 1 (i.e. the discovery phase). These results demonstrate that the sample size of study 2 (i.e. replication) was adequate when conditioned on the results from study 1 (see table and graph pasted below). The results showed that N=13 would be an adequate sample size for 80% power for behavoural and eye-tracking measurements. Power analysis for the EEG measurements indicated that we needed N=17. Combining these power analyses. Our sample size of N=15 for Study 2 was therefore reasonably justified.

      We have now added a section to the discussion (Lines 790-805) that communicates these issues as follows:

      “Our study brings together questions from two distinct fields of neuroscience: perceptual decision making and social neuroscience. Each of these two fields have their own traditions and practical common sense. Typically, studies in perceptual decision making employ a small number of extensively trained participants (approximately 6 to 10 individuals). Social neuroscience studies, on the other hand, recruit larger samples (often more than 20 participants) without extensive training protocols. We therefore needed to strike a balance in this trade-off between number of participants and number of data points (e.g. trials) obtained from each participant. Note, for example, that each of our participants underwent around 4000 training trials. Importantly, our initial study (N=12) yielded robust results that showed the hypothesized effects nearly completely, supporting the adequacy of our power estimate. However, we decided to replicate the findings in a new sample with N=15 participants to enhance the reliability of our findings and examine our hypothesis in a stringent discovery-replication design. In Figure 4-figure supplement 5, we provide the results of a power analysis that we applied on the data from study 1 (i.e. the discovery phase). These results demonstrate that the sample size of study 2 (i.e. replication) was adequate when conditioned on the results from study 1.”

      We conducted Monte Carlo simulations to determine the sample size required to achieve sufficient statistical power (80%) (Szucs & Ioannidis, 2017). In these simulations, we utilized the data from study 1. Within each sample size (N, x-axis), we randomly selected N participants from our 12 partpincats in study 1. We employed the with-replacement sampling method. Subsequently, we applied the same GLMM model used in the main text to assess the dependency of EEG signal slopes on social conditions (HCA vs LCA). To obtain an accurate estimate, we repeated the random sampling process 1000 times for each given sample size (N). Consequently, for a given sample size, we performed 1000 statistical tests using these randomly generated datasets. The proportion of statistically significant tests among these 1000 tests represents the statistical power (y-axis). We gradually increased the sample size until achieving an 80% power threshold, as illustrated in the figure.The the number indicated by the red circle on the x axis of this graph represents the designated sample size.

      Second, the manuscript interprets the effects of low-confidence partners as an impact of the partner's communicated "beliefs about uncertainty". However, it appears that the experimental setup also leads to greater outcome uncertainty (because the trial outcome is determined by the joint performance of both partners, which is normally reduced for low-confidence partners) and response uncertainty (because subjects need to consider not only their own confidence but also how that will impact on the low-confidence partner). While none of these other possible effects is conceptually unrelated to communicated confidence and the basic conclusions of the manuscript are therefore valid, the reader would like to understand to what degree the reported effects relate to slightly different types of uncertainty that can be elicited by communicated low confidence in this setup.

      We appreciate the reviewer’s advice to remain cautious about the possible sources of uncertainty in our experiment. In the Discussion (lines 790-801) we have now added the following paragraph.

      “We have interpreted our findings to indicate that social information, i.e. partner’s confidence, impacts the participants beliefs about uncertainty. It is important to underscore here that, similar to real life, there are other sources of uncertainty in our experimental setup that could affect the participants' belief. For example, under joint conditions, the group choice is determined through the comparison of the choices and confidences of the partners. As a result, the participant has a more complex task of matching their response not only with their perceptual experience but also coordinating it with the partner to achieve the best possible outcome. For the same reason, there is greater outcome uncertainty under joint vs individual conditions. Of course, these other sources of uncertainty are conceptually related to communicated confidence but our experimental design aimed to remove them, as much as possible, by comparing the impact of social information under high vs low confidence of the partner.”

      In addition to the above, we would like to clarify one point here with specific respect to the comment. Note that the computer-generated partner’s accuracy was identical under high and low confidence. In addition, our behavioral findings did not show any difference in accuracy under HCA and LCA conditions. As a consequence, the argument that “the trial outcome is determined by the joint performance of both partners, which is normally reduced for low-confidence partners)” is not valid because the low-confidence partner’s performance is identical to that of the high-confidence partner. It is possible, of course, that we have misunderstood the reviewer’s point here and we would be happy to discuss this further if necessary.

      Third, the methods used for measurement, signal processing, and statistical inference in the pupil analysis are questionable. For a start, the methods do not give enough details as to how the stimuli were calibrated in terms of luminance etc so that the pupil signals are interpretable.

      Here we provide in Author response image 1 the calibration plot for our eye tracking setup, describing the relationship between pupil size and display luminance. Luminance of the random dot motion stimuli (ie white dots on black background) was Cd/m2 and, importantly, identical across the two critical social conditions. We hope that this additional detail satisfies the reviewer’s concern. For the purpose of brevity, we have decided against adding this part to the manuscript and supplementary material.

      Author response image 1.

      Calibration plot for the experimental setup. Average pupil size (arbitrary units from eyelink device) is plotted against display luminance. The plot is obtained by presenting the participant with uniform full screen displays with 10 different luminance levels covering the entire range of the monitor RGB values (0 to 255) whose luminance was separately measured with a photometer. Each display lasted 10 seconds. Error bars are standard deviation between sessions.

      Moreover, while the authors state that the traces were normalized to a value of 0 at the start of the ITI period, the data displayed in Figure 2 do not show this normalization but different non-zero values. Are these data not normalized, or was a different procedure used? Finally, the authors analyze the pupil signal averaged across a wide temporal ITI interval that may contain stimulus-locked responses (there is not enough information in the manuscript to clearly determine which temporal interval was chosen and averaged across, and how it was made sure that this signal was not contaminated by stimulus effects).

      We have now added the following details to the Methods section in line 1106-1135.

      “In both studies, the Eye movements were recorded by an EyeLink 1000 (SR- Research) device with a sampling rate of 1000Hz which was controlled by a dedicated host PC. The device was set in a desktop and pupil-corneal reflection mode while data from the left eye was recorded. At the beginning of each block, the system was recalibrated and then validated by 9-point schema presented on the screen. For one subject was, a 3-point schema was used due to repetitive calibration difficulty. Having reached a detection error of less than 0.5°, the participants proceeded to the main task. Acquired eye data for pupil size were used for further analysis. Data of one subject in the first study was removed from further analysis due to storage failure.

      Pupil data were divided into separate epochs and data from Inter-Trials Interval (ITI) were selected for analysis. ITI interval was defined as the time between offset of trial (t) feedback screen and stimulus presentation of trial (t+1). Then, blinks and jitters were detected and removed using linear interpolation. Values of pupil size before and after the blink were used for this interpolation. Data was also mid-pass filtered using a Butterworth filter (second order,[0.01, 6] Hz)[50]. The pupil data was z-scored and then was baseline corrected by removing the average of signal in the period of [-1000 0] ms interval (before ITI onset). For the statistical analysis (GLMM) in Figure 2, we used the average of the pupil signal in the ITI period. Therefore, no pupil value is contaminated by the upcoming stimuli. Importantly, trials with ITI>3s were excluded from analysis (365 out of 8800 for study 1 and 128 out 6000 for study 2. Also see table S7 and Selection criteria for data analysis in Supplementary Materials)”

      Fourth, while the EEG analysis in general provides interesting data, the link to the well-established CPP signal is not entirely convincing. CPP signals are usually identified and analyzed in a response-locked fashion, to distinguish them from other types of stimulus-locked potentials. One crucial feature here is that the CPPs in the different conditions reach a similar level just prior to the response. This is either not the case here, or the data are not shown in a format that allows the reader to identify these crucial features of the CPP. It is therefore questionable whether the reported signals indeed fully correspond to this decision-linked signal.

      Fifth, the authors present some effective connectivity analysis to identify the neural mechanisms underlying the possible top-down drive due to communicated confidence. It is completely unclear how they select the "prefrontal cortex" signals here that are used for the transfer entropy estimations, and it is in fact even unclear whether the signals they employ originate in this brain structure. In the absence of clear methodical details about how these signals were identified and why the authors think they originate in the prefrontal cortex, these conclusions cannot be maintained based on the data that are presented.

      Sixth, the description of the model fitting procedures and the parameter settings are missing, leaving it unclear for the reader how the models were "calibrated" to the data. Moreover, for many parameters of the biophysical model, the authors seem to employ fixed parameter values that may have been picked based on any criteria. This leaves the impression that the authors may even have manually changed parameter values until they found a set of values that produced the desired effects. The model would be even more convincing if the authors could for every parameter give the procedures that were used for fitting it to the data, or the exact criteria that were used to fix the parameter to a specific value.

      Seventh, on a related note, the reader wonders about some of the decisions the authors took in the specification of their model. For example, why was it assumed that the parameters of interest in the three competing models could only be modulated by the partner's confidence in a linear fashion? A non-linear modulation appears highly plausible, so extreme values of confidence may have much more pronounced effects. Moreover, why were the confidence computations assumed to be finished at the end of the stimulus presentation, given that for trials with RTs longer than the stimulus presentation, the sensory information almost certainly reverberated in the brain network and continued to be accumulated (in line with the known timing lags in cortical areas relative to objective stimulus onset)? It would help if these model specification choices were better justified and possibly even backed up with robustness checks.

      Eight, the fake interaction partners showed several properties that were highly unnatural (they did not react to the participant's confidence communications, and their response times were random and thus unrelated to confidence and accuracy). This questions how much the findings from this specific experimental setting would transfer to other real-life settings, and whether participants showed any behavioral reactions to the random response time variations as well (since several studies have shown that for binary choices like here, response times also systematically communicate uncertainty to others). Moreover, it is also unclear how the confidence convergence simulated in Figure 3d can conceptually apply to the data, given that the fake subjects did not react to the subject's communicated confidence as in the simulation.

    1. Author Response

      Joint Public Review

      This manuscript utilizes Drosophila melanogaster as a model system to functionally characterize the role of genes previously associated with obstructive pulmonary disease (COPD) in epithelial barrier function. Using genetic and imaging approaches, the authors characterised a previously unrecognised role of intestinal Acetylcholine receptor (AchR) signalling, in the regulation of epithelial barrier function. The working model proposes that Acetylcholine (Ach) produced by enteroendocrine cells (EEs) and enteric neurons signals to AchR in enterocytes (ECs). This signalling activates the secretion of the Peritrophic membrane (PM) through the regulation of the exocytic protein Syt4. In this way, Ach/AchR signalling works to protect epithelial barrier function and organismal tolerance to ingested damaging agents, such as those causing oxidative stress.

      Overall, the data presented support the main model of the paper: EC AchR activation is necessary to maintain epithelial barrier function. The evidence, however, on the mechanisms downstream of AchR, namely, the involvement of this signalling pathway in the regulation of Syt4 is weak.

      The work in this manuscript represents an important proof of concept for the use of the Drosophila midgut as a model to functionally interrogate genes from human genetic association studies in pathologies affecting epithelial homeostasis.

      We would like to thank the reviewers for their positive assessment of the significance of the study. The reviewers point out that the reported data support the conclusions of the manuscript and request additional studies to elucidate the downstream mechanism in more detail. We have now edited our manuscript according to the specific requests, including additional data and further clarifications of our model. We believe these new data and edits significantly improve the manuscript and hope that it is now acceptable for publication in eLife

    1. Author Response

      Reviewer #1 (Public Review):

      Mano et. al. use a combination of behavioral, genetic silencing, and functional imaging experiments to explore the temporal properties of the optomotor response in Drosophila. They find a previously unreported inversion of the behavior under high contrast and luminance conditions and identify potential pathways mediating the effect.

      Strengths:

      Quantifications of optomotor behavior have been performed for many decades. Despite a large number of previous studies, the authors still find something fundamentally novel: under high contrast conditions and extended stimulation periods, the behavior becomes dynamic over time. The turning response shows an initial transient positive following response. The amplitude of the behavior then decreases and even inverts such that animals show an anti-directional rotation response. The authors systematically explore the stimulation feature space, including large ranges of spatial and temporal frequencies and conditions with high and low contrast. They also test two wild-type fly species and even compare experiments across two different labs and setups. From these data, it seems clear that the behavior is robust and largely depends on the brightness of the stimulation, rearing conditions, and genetic background. The authors discuss that these effects have not clearly been reported elsewhere beforehand, and convincingly argue why this may be the case.

      In general, the presented behavioral quantifications illustrate the importance of further experimental studies of the temporal dynamics of behavior in response to dynamically varying stimulus features, across different stimulus types, genetic backgrounds, and model animal systems. It also illustrates the importance of relating the conditions that animals experience in the laboratory to the ones they would experience in the wild. As the authors mention, the brightness during a sunny day can reach values as high as 4000 cd/m2, while experimental stimulation in the lab has so far often been orders of magnitude below that.

      The study then systematically explores potential neural elements involved in the behavior. Through a set of silencing experiments, they find that T4 and T5 neurons, as expected, are required for motion behaviors. On the other hand, silencing HS cells largely abolishes the 'classical' syn-directional response but leaves anti-directional turning intact. On the other hand, silencing CH cells abolishes the anti-directional response but leaves the syn-directional behavior intact. Through functional imaging in T4, T5, HS, and CH neurons, the authors could show that none of these neurons shows a response inversion depending on contrast level. Together, these experiments nicely illustrate that the dynamics do not seem to be computed within the early parts of visual processing, but they must happen on the level of the lobula plate or further downstream.

      Weaknesses:

      While the authors have already explored various parameters of the experiment, it would have been nice to see additional experiments regarding the initial adaptation phase. The experiments in Figure 2e, where the authors show front-to-back or back-to-front gratings before the rotation phase, are a good start. What would the behavioral dynamics look like if they had exposed animals to long periods of static high or low contrast gratings, whole field brightness, or darkness? Such experiments would surely help to better understand the stimulus features on which the adaptation elements operate. It would be interesting to explore to what degree such static stimuli impact the subsequent behavioral dynamics.

      To address this question, we have added a new adaption condition, in which a high contrast, stationary sinusoidal grating is presented for 5 seconds before the high contrast rotational stimulus is presented (new Figure 2 – Supp. Fig. 1). We find that the turning looks identical to the case of a gray adapter. These results drive home the point that the direction of motion of the adapter is what matters most.

      Given the dynamics of the behavior, it would probably also be worth looking at the turning dynamics after the stimulus has stopped. If direction-selective adaptation mechanisms are regulating the turning response, one may find long-lasting biases even in the absence of stimulation. If the authors have more data after the stimulus end, it would be good to further expand the time range by a few seconds to show if this is the case or not (for example, in Figure 1b).

      We now show these dynamics in Figure 1. See Essential Revision #1.

      Another important experiment could be to initially perform experiments in a closed-loop configuration, and then quickly switch to open-loop. The closed-loop configuration should allow the motion computing circuitry to adapt to the chosen environmental conditions. Explorations of the changes in turning response dynamics after such treatments should then enable further dissections of the mechanisms of adaptation. Closed-loop experiments under different contrast conditions have already been performed (for example, Leonhardt et al. 2016), which also showed complex response dynamics after stimulus on- and offset. It would be great to discuss the current open-loop experiments, and maybe some new closed-loop results, in relation to the previous work.

      We have performed these suggested experiments; please see Essential Revision #2.

      The authors mention the different rearing conditions, and there is one experiment in Figure S2 which mentions running experiments at 25 deg C. But it is not clear from the Methods at which temperature all other experiments have been performed. It is also not clear at which temperature the shibire block experiments were performed. As such experiments require elevated temperatures, I assume that all behavioral experiments have been performed at such levels? How high were those?

      Our apologies for leaving out this important information. In DAC’s lab, behavioral experiments are run at 34-36ºC in a room maintaining ~50% relative humidity (this yields ~25% RH in the box with the experiment, as we now note in the methods). These conditions yield high quality, reproducible behavior, especially since this temperature elicits strong walking behavior. In TRC’s lab, behavioral experiments are similarly run at 34ºC in a room maintaining ~50% relative humidity (similarly with ~25% RH in the experimental box), for similar reasons. We have now added these details to the methods sections for each lab’s behavioral experiments.

      What does the fly see before and after the stimulus (i.e. the gray boxes in all figures)? Are these periods of homogenous gray levels or are these non-moving gratings with the luminance and contrast of the subsequent stimulus? It would be important to add this information to the methods and to the figure illustrations or legends.

      In the figures, gray is a uniform luminance screen that appears before and after the stimuli, with luminance matched to the mean stimulus luminance. We have now included this in the methods section where we describe how stimuli were generated in each lab.

      It would be nice to discuss the potential location where the motion adaptation may be implemented in the brain. A small model scheme as an additional figure could further help to discuss how such computations may be mechanistically implemented, helping readers to think about future experimental dissections of the behavior.

      Following this suggestion, we have created a diagram that shows a potential mechanistic implementation of the behavior observed, and summarizes our results (new Figure 6 – Supp. Fig. 2). There are many other possible alternatives that we do not show, including exactly how an opposing signal could ramp up under the conditions of these experiments. In the figure caption, we remind readers what locations have been excluded for this sort of computation. We reference this diagram where we discuss subtraction in the Discussion.

      For setting up similar experiments in other labs, the authors need to better describe how they measured the luminance of the arena. Do they simply report the brightness delivered by the Lightcrafter system, or did they measure this with a lux-meter? If so, at which distance was the measurement performed and with which device? Given that the behavior is sensitive to the specific properties of the stimulus, it will be important to report these numbers carefully to enable other groups to reproduce effects.

      In brief, since these are rear projection screens, we can easily measure light intensity by placing a power meter in front of the screen. This gives us the photon flux in watts, which can be converted to lumens by a standard conversion and then into candelas by making the approximation that our screen scatters into 2π steradians. Dividing by the sensor area gives us our desired candelas per square-meter. We have now added this methodology to the methods section.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The study assesses the impact of testing contacts of cases in school classes when identified, rather than at the end of quarantine, on various outcomes such as secondary infections, tracing delay, and identification of the possible source of infection. The authors find that the intervention likely reduced tracing delay and increased the number of possible infection sources. However, due to unmeasured confounding, it remains unclear if secondary transmission actually decreased. The analysis requires clarification and further explanation in parts.

      Major strengths and weaknesses:

      The study benefits from the assessment of various outcomes in contact tracing in addition to changes in transmission, such as tracing delay, and the identification of putative infectors; however the assumption that other cases found in households are infectors of the index case rather than putative infectees, may introduce significant bias, but this is not mentioned in the Discussion despite being significant. It is difficult to understand the intervention in Figure 1 due to unclear labelling and incomplete descriptions in the caption. The authors mention that the same school class could be included multiple times for multiple outbreaks - was there a time cutoff for inclusion? I had a lot of trouble interpreting or reproducing the values given in Table 1. Firstly, the methods used to produce the RRs given are not described in the methods section of the paper. What are the outcomes - "classes" and "indexes" are poroly defined. Is this output from univariate or multivariate regression model, and what is the link function? I was also unable to reproduce the RRs listed in the table despite attempting several methods. The closest numbers I achieved were by crudely dividing the risks (e.g. for the RR for known infection source I took the ratio of indexes for which a school contact was suspected pre and post-intervention (644/1175)/(146/429) = 1.61), but if this is the case then the unknown class is by definition not the reference category. This is the same for the other RRs stated in the table. The methods used should be clarified and results updated if erroneous. The mediation analysis components and their relevance to the study could be better explained in the methods and results.

      Achievement of aims and support for conclusions:

      The authors partially achieved their aims by demonstrating a likely decrease in tracing delay and an increase in possible infection sources. However, the study's inability to determine if secondary transmission decreased due to unmeasured confounding limits the conclusiveness of the findings. The authors should reiterate the main numerical results in the first few paragraphs of the discussion.

      Impact on the field and utility of methods and data:

      This study has the potential to impact the field by highlighting the benefits of testing contacts earlier in school classes. The findings on reduced tracing delay and increased identification of infection sources can inform future strategies and interventions. However, clarity on the analysis methods, as well as the results, are necessary to ensure the utility and reliability of the findings.

      We thank the reviewer for his encouraging comments, we completely agree with the interpretation of our findings. Nevertheless, the intervention under evaluation is not exactly as descried by the reviewer. In fact, the change of contact tracing targeted mostly the tracing in household cases. Investigation in schools used the immediate testing of all contacts already before the intervention, even if after the intervention the timeliness increased. It was in the household where we had a clear change with immediate testing of all asymptomatic family contacts.

      The assumption of direction of infection: We understand the reviewer’s point and we agree that such an assumption would introduce an important bias. Nevertheless, we do not assume any direction of the infection. We only report the conclusions of the field investigation conducted during the school outbreak about a known source of infection for the index case.

      On the contrary, in our conceptual framework, we make the hypothesis that introducing backward contact tracing for all cases in the community (mostly household infections) asymptomatic cases in school age were more promptly identified and this improved the surveillance of school outbreaks and possibly reduced transmission in school outbreaks. This increase in timeliness could occur whatever the direction of infection within the household was, i.e. from the symptomatic adult to the asymptomatic child or the other way round.

      Figure 1: we completely changed figure 1 according to reviewer’s suggestions.

      Table 1: it has been split in two tables, the first describe the characteristics of the classes and index cases and the outcomes of the outbreaks, and the second is a table showing the association between possible confounders and the main outcome. We are sorry; trying to make the paper shorter, we made the table very unclear.

      Repeated outbreaks in the same class: we thank the reviewer for this point. We did not define a time limit to distinguish two episodes. The outbreaks were defined by the field investigations. If the class was involved in two investigations, public health operators firstly tried to assess if there was a direct link between the two. Actually, it was impossible that two outbreaks were considered independent if there was less than 21 days between the two index cases notifications. We added a sentence in the methods.

      Mediation analysis rationale: we added a DAG to explain the mediation analysis, we also changed the results reporting following step by step the preliminary results to introduce the mediation analysis to justify the selection of the mediators and the confounders.

      Discussion: we added the main findings in a quantitative way at the beginning of the discussion.

      Reviewer #2 (Public Review):

      This is a review of "Effect of an enhanced public health contact tracing intervention on the secondary transmission of SARS-CoV-2 in educational settings: the four-way decomposition analysis", by Djuric et al.

      In late 2020, a province in northern Italy implemented a new testing regimen for all contacts of people known to have COVID-19, offering them SARS-CoV-2 testing immediately after the detection of the index case instead of at the end of a quarantine period. The authors of this study investigated whether this policy change reduced secondary transmission of SARS-CoV-2 in schools. In addition to studying this primary outcome, they examined two "process" outcomes; whether this policy of testing earlier enabled public health officials to more successfully identify the source of infection of the index case, and if the time interval from detection of the index case to testing of contacts in the educational setting reduced.

      They concluded that the time between detection of the index case and testing of contacts did reduce before and after the policy change. Similarly, the proportion of cases for which the source of infection was identified also increased after the policy change. Both of these "process" indicators correlated with reduced secondary transmission, though only identifying the source of infection was associated with a statistically significant (at the 5% level) reduction in secondary transmission.

      Strengths of this paper

      Educational settings experienced significant disruption during the COVID-19 pandemic, and efforts to better understand the spread of SARS-CoV-2 in schools - and how to mitigate this spread - are of significant public health importance. This paper, therefore, addresses an important topic.

      Additionally, the authors describe a detailed dataset comprising case and contact tracing data from over 1,600 index cases with in-school contacts. The richness of the data described in Table 1 provides a good opportunity to conduct a natural experiment on the potential impact of testing contacts immediately after exposure on secondary transmission. The authors also appropriately acknowledge that this interrupted time series study would be insufficient to provide causal information, given the potential for confounders.

      Finally, the primary statistical method (a four-way decomposition analysis) was new to me, but - from the references cited - seems appropriate. Given the relative novelty of this method, more space could be dedicated to explaining it in the methods.

      Weakness of this paper

      Although the paper tackles an important topic with an appropriate dataset, the analyses feel insufficient to fully support the authors' conclusions.

      First and most critically, it is difficult to understand exactly what the primary outcome of the study is. Both the median number of secondary cases per class and the proportion of classes that experienced any secondary transmission are presented in Table 1, but - at least in the unadjusted analyses - point in different directions regarding the impact of the effect of the intervention (albeit neither strongly). For example, before the policy change, the median number of secondary cases per index case is 2, while after the policy change, it has reduced to 1. In contrast, before the policy change 37% of classes experienced any secondary transmission, but after the policy change, this had increased to 39% of classes. In some of the adjusted analyses, "number of secondary cases" is stated as the outcome variable, but that is not fully defined. The "attack rate", which is well defined in the methods, could be one option for use as a consistent primary outcome, however, it is only provided for the total study population and the attack rates pre- or post-policy change are not presented or compared.

      Additionally, although using a "process measure" as a secondary outcome could be valuable - especially in a natural experiment like this, where identifying a causal relationship with a complex outcome like secondary transmission will be difficult - it was somewhat unclear how the process measures described in this study were measured, or their validity. For example, the reduced time between detection of the index case and testing of contacts seems unsurprising, since the intervention itself is to test contacts immediately after the index case is identified. Additionally, the results describe reductions in median testing delay and median tracing delay, but only testing delay is defined in the methods.

      Finally, there is existing published literature that provides additional context on the impact of testing on secondary transmission within schools that arguably provides a higher level of evidence than the current study, but is not cited by the authors. A key limitation of this study - which the authors acknowledge - is the interrupted time series nature of their study, which is open to confounding by other important factors that happened at the same time, including but not limited to: changes in overall incidence of COVID-19; viral evolution (e.g. the emergence of the Alpha variant (B.1.1.7) which occurred during this study and which significantly altered the risk of secondary transmission); the efficiency of the contact tracing system (including skill and size of the contact tracing workforce); and the availability of non-molecular diagnostic tests (e.g. lateral flow devices) that might allow individuals to change their behaviors even without enrolling in this study. Examples of alternative studies which might reduce some of this potential confounding include around 400 schools in Los Angeles County, California, USA, that implemented "test to stay" in 2021 and were compared to 1,600 schools that did not implement "test to stay" [https://www.cdc.gov/mmwr/volumes/70/wr/mm705152e1.htm] and a cluster-randomized trial of daily testing of exposed contacts to study in-school transmission in England, UK, also in 2021 [https://www.sciencedirect.com/science/article/pii/S0140673621019085]. Although these examples describe slightly different interventions involving enhanced testing of exposed contacts, they both compared educational settings with and without the intervention across the same time periods; and the UK study in particular has methodological advantages over this current paper, including randomization. While the findings in the current paper did not contradict these earlier, stronger papers, the example from this province should be placed in context with the totality of evidence around testing in schools.

      We thank the reviewer for his encouraging and useful comments.

      We have completely reframed Table 1 and split it in two separate tables. We have added suggested references.

      According to the reviewer’s suggestions, we tried to better describe the main outcome and to justify our choice. We also added a definition of testing delay that was missing. We added a box explaining in plain language all the outputs of the mediation analysis. We improved reporting of the descriptive data in table 1, including attack rate.

      Furthermore, we better explained the choice of process outcomes and how they were related to the main outcome a priori and what changes were expected under the hypothesis that the intervention worked correctly. In particular, we agree that a reduction in the time to testing was unsurprising, in fact, this was just to check that the intervention was actually and correctly implemented; increasing the proportion of index cases with a known source of infection (and the proportion of asymptomatic index cases, that was not identified in the initial protocol but we identified later as an important process indicator) is a process indicator suggesting that more index cases have been identified as a consequence of a household investigation, i.e. the change in tracing helped in early detection of school exposure.

      Regarding the proportion of classes with secondary transmission, we added a sentence in the discussion explaining why we did not expect that this would change after the intervention. In fact, as described in the new figure 1, household contacts were immediately quarantined before as well as after the intervention, what changed is that they are timely identified as contacts and therefore school contacts are identified and isolated. Only if a secondary transmission in the class already occurred we could reduce transmission in the class, i.e. we are preventing tertiary cases not secondary. Nevertheless, the number of classes investigated is also expected to change, so it was difficult to predict if the proportion of investigated classes with transmission should increase or decrease.

      In the discussion, we reported examples of studies that applied an experimental or semi-experimental design and thus overcame the main limits of our observational study. Nevertheless, we also highlighted that the intervention we are evaluating in this study was particularly difficult to be conducted in a trial or a semi-experimental setting, in fact, we are trying to evaluate a change in the contact tracing in the community that occurred during the peak of the second wave.

    1. Author Response

      Reviewer #1 (Public Review):

      Briggs et al use a combination of mathematical modelling and experimental validation to tease apart the contributions of metabolic and electronic coupling to the pancreatic beta cell functional network. A number of recent studies have shown the existence of functional beta cell subpopulations, some of which are difficult to fully reconcile with established electrophysiological theory. More generally, the contribution of beta cell heterogeneity (metabolism, differentiation, proliferation, activity) to islet function cannot be explained by existing combined metabolic/electrical oscillator models. The present studies are thus timely in modelling the islet electrical (structural) and functional networks. Importantly, the authors show that metabolic coupling primarily drives the islet functional network, giving rise to beta cell subpopulations. The studies, however, do not diminish the critical role of electrical coupling in dictating glucose responsiveness, network extent as well as longer-range synchronization. As such, the studies show that islet structural and functional networks both act to drive islet activity, and that conclusions on the islet structural network should not be made using measures of the functional network (and vice versa).

      Strengths:

      • State-of-the-art multi-parameter modelling encompassing electrical and metabolic components.

      • Experimental validation using advanced FRAP imaging techniques, as well as Ca2+ data from relevant gap junction KO animals.

      • Well-balanced arguments that frame metabolic and electrical coupling as essential contributors to islet function.

      • Likely to change how the field models functional connectivity and beta cell heterogeneity.

      Weaknesses:

      • Limitations of FRAP and electrophysiological gap junction measures not considered.

      • Limitations of Cx36 (gap junction) KO animals not considered.

      • Accuracy of citations should be improved in a few cases.

      We thank reviewer 1 for their positive comments, including the many strengths in the approaches, arguments and impact. We do note the weaknesses raised by the reviewer and have addressed them following the comments below.

      We would like to also note that when we refer to metabolic activity driving the functional network, we are not referring to metabolic coupling between beta cells. Rather we mean that two cells that show either high levels of metabolic activity (glycolytic flux) or that show similar levels metabolic activity will show increased synchronization and thus a functional network edge as compares to cells with elevated gap junction conductance. Increased metabolic activity would likely generate increased depolarizing currents that will provide an increased coupling current to drive synchronization; whereas similar metabolic activity would mean a given coupling current could more readily drive synchronized activity. We have substantially rewritten the manuscript to clarify this point.

      Reviewer #2 (Public Review):

      In their present work, Briggs et al. combine biophysical simulations and experimental recordings of beta cell activity with analyses of functional network parameters to determine the role played by gap-junctional coupling, metabolism, and KATP conductance in defining the functional roles that the cells play in the functional networks, assess the structure-function relationship, and to resolve an important current open question in the field on the role of so-called hub cells in islets of Langerhans.

      Combining differential equation-based simulations on 1000 coupled cells with demanding calcium, NAPDH, and FRAP imaging, as well as with advanced network analyses, and then comparing the network metrics with simulated and experimentally determined properties is an achievement in its own right and a major methodological strength. The findings have the potential to help resolve the issue of the importance of hub cells in beta cell networks, and the methodological pipeline and data may prove invaluable for other researchers in the community.

      However, methodologically functional networks may be based on different types of calcium oscillations present in beta cells, i.e., fast oscillations produced by bursts of electrical activity, slow oscillations produced by metabolic/glycolytic oscillations, or a mixture of both. At present, the authors base the network analyses on fast oscillations only in the case of simulated traces and on a mixture of fast and slow oscillations in the case of experimental traces. Since different networks may depend on the studied beta cell properties to a different extent (e.g., fast oscillation-based networks may, more importantly, depend on electrical properties and slow oscillationbased networks may more strongly depend on metabolic properties), it is important that in drawing the conclusions the authors separately address the influence of a cell's electrical and metabolic properties on its functional role in the network based on fast oscillations, slow oscillations, or a mixture of both.

      We thank reviewer 2 for their positive comments, including addressing the importance of this study as it pertains to islet biology and acknowledging methodological complexities of this study. We also thank the reviewer for their careful reading and providing useful comments. We have integrated each comment into the manuscript. Most importantly, we have now extended our analysis to both fast and slow oscillations by incorporating an additional mathematical model of coupled slow oscillations and performing additional experimental analysis of fast, slow, and mixed oscillations.

      Reviewer #3 (Public Review):

      Over the past decade, novel approaches to understanding beta cell connectivity and how that contributes to the overall function of the pancreatic islet have emerged. The application of network theory to beta cell connectivity has been an extremely useful tool to understand functional hierarchies amongst beta cells within an islet. This helps to provide functional relevance to observations from structural and gene expression data that beta cells are not all identical.

      There are a number of "controversies" in this field that have arisen from the mathematical and subsequent experimental identification of beta "hub" cells. These are small populations of beta cells that are very highly connected to other beta cells, as assessed by applying correlation statistics to individual beta cell calcium traces across the islet.

      In this paper Briggs et al set out to answer the following areas of debate:

      They use computational datasets, based on established models of beta cells acting in concert (electrically coupled) within an islet-like structure, to show that it is similarities in metabolic parameters rather than "structural" connections (ie proximity which subserves gap junction coupling) that drives functional network behaviour. Whilst the computational models are quite relevant, the fact that the parameters (eg connectivity coefficients) are quite different to what is measured experimentally, confirm the limitations of this model. Therefore it was important for the authors to back up this finding by performing both calcium and metabolic imaging of islet beta cells. These experimental data are reported to confirm that metabolic coupling was more strongly related to functional connectivity than gap junction coupling. However, a limitation here is that the metabolic imaging data confirmed a strong link between disconnected beta cells and low metabolic coupling but did not robustly show the opposite. Similarly, I was not convinced that the FRAP studies, which indirectly measured GJ ("structural") connections were powered well enough to be related to measures of beta cell connectivity.

      The group goes on to provide further analytical and experimental data with a model of increasing loss of GJ connectivity (by calcium imaging islets from WT, heterozygous (50% GJ loss), and homozygous (100% loss). Given the former conclusion that it was metabolic not GJ connectivity that drives small world network behaviour, it was surprising to see such a great effect on the loss of hubs in the homs. That said, the analytical approaches in this model did help the authors confirm that the loss of gap junctions does not alter the preferential existence of beta cell connectivity and confirms the important contribution of metabolic "coupling". One perhaps can therefore conclude that there are two types of network behaviour in an islet (maybe more) and the field should move towards an understanding of overlapping network communities as has been done in brain networks.

      Overall this is an extremely well-written paper which was a pleasure to read. This group has neatly and expertly provided both computational and experimental data to support the notion that it is metabolic but not "structural" ie GJ coupling that drives our observations of hubs and functional connectivity. However, there is still much work to do to understand whether this metabolic coupling is just a random epiphenomenon or somehow fated, the extent to which other elements of "structural" coupling - ie the presence of other endocrine cell types, the spatial distribution of paracrine hormone receptors, blood vessels and nerve terminals are also important.

      We thank reviewer 3 for their positive comments, including the methodology, writing style, and the importance of this paper to the broader islet community. We thank the reviewer for their very in-depth and helpful comments. We have addressed each comment below and made significant changes to the manuscript according. We conducted more FRAP experiments and separated results into slow, fast, and mixed oscillations. We included analysis of an additional computational model that simulates slow calcium oscillations. Additionally, we substantially rewrote the paper to clarify that we are not referring to metabolic coupling and speak on the broader implications of network theory and our findings.

      Reviewer #4 (Public Review):

      This manuscript describes a complex, highly ambitious set of modeling and experimental studies that appear designed to compare the structural and functional properties of beta cell subpopulations within the islet network in terms of their influence on network synchronization. The authors conclude that the most functionally coupled cell subpopulations in the islet network are not those that are most structurally coupled via gap junctions but those that are most metabolically active.

      Strengths of the paper include (1) its use of an interdisciplinary collection of methods including computer simulations, FRAP to monitor functional coupling by gap junctions, the monitoring of Ca2+ oscillations in single beta cells embedded in the network, and the use of sophisticated approaches from probability theory. Most of these methods have been used and validated previously. Unfortunately, however, it was not clear what the underlying premise of the paper actually is, despite many stated intentions, nor what about it is new compared to previous studies, an additional weakness.

      Although the authors state that they are trying to answer 3 critical questions, it was not clear how important these questions are in terms of significance for the field. For example, they state that a major controversy in the field is whether network structure or network function mediates functional synchronization of beta cells within the islet. However, this question is not much debated. As an example, while it is known that there can be long-range functional coupling in islets, no workers in the field believe there is a physical structure within islets that mediates this, unlike the case for CNS neurons that are known to have long projections onto other neurons. Beta cells within the islets are locally coupled via gap junctions, as stated repeatedly by the authors but these mediate short-range coupling. Thus, there are clearly functional correlations over long ranges but no structures, only correlated activity. This weakness raises questions about the overall significance of the work, especially as it seems to reiterate ideas presented previously.

      We thank reviewer 4 for their positive comments, including our multidisciplinary use of mathematical models and experimental imaging techniques. We have now included an additional model of slow oscillations (the Integrated Oscillator Model) to improve our conclusions. We also thank reviewer 4 for the insightful comments. We have carefully reviewed each comment and made significant changes to the manuscript accordingly. In particular, we have significantly rewritten the introduction and discussion attempting to clarify what is new in our manuscript and what is previously shown. Additionally, we agree with the reviewers’ sentiment that there is little debate over whether, for example, there are physical structures within the islet that mediate long-range functional connections. However, there is current debate over whether functional beta-cell subpopulations can dictate islet dynamics (see [11]–[13]). This debate can be framed by observing whether these functional subpopulations emerge from the islet due to physical connections (structural network) or something more nuisance (such as intrinsic dynamics). We have reframed the introduction and discussion to clarify this debate as well as more clearly state the premise of the paper.

      Specific Comments

      1). The authors state it is well accepted that the disruption of gap junctional coupling is a pathophysiological characteristic of diabetes, but this is not an opinion widely accepted by the field, although it has been proposed. The authors should scale back on such generalizations, or provide more compelling evidence to support such a claim.

      Thank you for pointing this out, we have provided more specific citations and changes the wording from “well accepted” to “has been documented”. See Discussion page 13 lines 415-416.

      2) The paper relies heavily on simulations performed using a version of the model of Cha et al (2011). While this is a reasonable model of fast bursting (e.g. oscillations having periods <1 min.), the Ca2+ oscillations that were recorded by the authors and shown in Fig. 2b of the manuscript are slow oscillations with periods of 5 min and not <1 min, which is a weakness of the model in the current context. Furthermore, the model outputs that are shown lack the well-known characteristics seen in real islets, such as fast-spiking occurring on prolonged plateaus, again as can be seen by comparing the simulated oscillations shown in Fig. 1d with those in Fig. 2b. It is recommended that the simulations be repeated using a more appropriate model of slow oscillations or at least using the model of Cha et al but employed to simulate in slower bursting.

      The reviewer raises an important point and caveat associated with our simulated model and experimental data. This point was also made by other reviewers, and a similar response to this comment can be found elsewhere in response to reviewer 2 point 6. To address this comment, we have performed several additional experiments and analyses:

      1) We collected additional Ca2+ (to identify the functional network and hubs) and FRAP data (to assess gap junction permeability) in islets which show either pure slow, pure fast, or mixed oscillations. We generated networks based on each time scale to compare with FRAP gap junction permeability data. We found that the conclusions of our first draft to be consistent across all oscillation types. There was no relationship between gap junction conductance, as approximated using FRAP, and normalized degree for slow (Figure 3j), fast (Figure 3 Supp 1d,e), or mixed (Figure 3 Supp 1g,h) oscillations. We also include discussion of these conclusions - See Results page 7 lines 184-186 and lines 188-191, Discussion page 12 lines 357-360.

      2) We also performed additional simulations with a coupled ‘Integrated Oscillator Model’ which shows slow oscillations because of metabolic oscillations (Figure 2). We compared connectivity with gap junction coupling and underlying cell parameters. In this case, there is an association between functional and structural networks, with highly-connected hub cells showing higher gap junction conductance (Figure 2f) but also low KATP channel conductance (gKATP) (Figure 2e). However, there are some caveats to these findings – given the nature of the IOM model, we were limited to simulating smaller islets (260 cells) and less heterogeneity in the calcium traces was observed. Additional analysis suggests the greater association between functional and structural networks in this model was a result of the smaller islets, and the association was also dependent on threshold (unlike in the Cha-Noma fast oscillator model) robust. These limitations and results are discussed further (Discussion page 11 lines 344-354).

      Additionally, in the IOM, the underlying cell dynamics of highly-connected hub cells are differentiated by KATP channel conductance (gKATP), which is different than in the fast oscillator model (differentiated by metabolism, kglyc). However this difference between models can be linked to differences in the way duty cycle is influenced by gKATP and kglyc (Figure 1h, Figure 2g). In each model there was a similar association between duty cycle and highly-connected hub cells. We also discuss these findings (Discussion page 11 lines 334-343).

      Overall these results and discussion with respect to the coupled IOM oscillator model can be found in Figure 2, Results page 6 lines 128-156 and Discussion page 11 lines 332-354.

      3) Much of the data analyzed whether obtained via simulation or through experiment seems to produce very small differences in the actual numbers obtained, as can be seen in the bar graphs shown in Figs. 1e,g for example (obtained from simulations), or Fig. 2j (obtained from experimental measurements). The authors should comment as to why such small differences are often seen as a result of their analyses throughout the manuscript and why also in many cases the observed variance is high. Related to the data shown, very few dots are shown in Figs. 1eg or Fig 4e and 4h even though these points were derived from simulations where 100s of runs could be carried out and many more points obtained for plotting. These are weaknesses unless specific and convincing explanations are provided.

      We thank the reviewer for these comments, which are similar to those of reviewer 2 (point 4) and reviewer 3 (point 6). Indeed there is some variability between cells in both simulations and experiments related to the metabolic activity in hubs and non-hubs. The variability points to potentially other factors being involved in determining hubs beyond simply kglyc, including a minor role for gap junction coupling structural network and potentially cell position and other intrinsic factors. We now discuss this point – see Discussion page 12 lines 364-266.

      The differences between hubs and nonhubs appear small because the value of kglyc is very small. For figure 1e, the average kglyc for nonhubs was 1.26x10-4 s-1 (which is the average of the distribution because most cells are non hubs) while the average kglyc for hubs was 1.4x10-4 s-1 which is about half of a standard deviation higher. The paired t-test controls for the small value of average kglyc.

      For simulation data each of the 5 dots corresponds to a simulated islet averaged over 1000 cells (or 260 cells for coupled IOM). The computational resources are high to generate such data so it is not feasible to conduct 100s of runs. Again, we note the comparisons between hubs and non-hubs are paired, and we find statistically significant differences for kglyc in figure 1 using only 5 paired data points. That we find these differences indicates the substantial difference between hubs and non-hubs. This is further supported all effect sizes being much greater than 0.8 for all significantly different findings (Cha Noma - kglyc: 2.85, gcoup: 0.82) (IOM: gKATP: 1.27, gcoup: 2.94) – We have included these effect sizes in the captions see Figure 1 and 2 captions (pages 34, 36)

      To consider all of the available data rather than the average across an entire islet, we created a kernel density estimate the kglyc for hubs and nonhubs created by concatenating every single cell in each of the five islets. A kstest results in a highly significant difference (P<0.0001) between these two distributions.

      Author response image 1.

      4) The data shown in Fig. 4i,j are intended to compare long-range synchronization at different distances along a string of coupled cells but the difference between the synchronized and unsynchronized cells for gcoup and Kglyc was subtle, very much so.

      Thank you for pointing out these subtle differences. The y-axis scale for i and j is broad to allow us to represent all distances on a single plot. After correction for multiple comparison, the differences were still statistically significant. As the reviewer mentioned in point 3, each plot contains only five data points, each of which represent the average of a single simulated islet, therefore we are not concerned about statistical significance coming from too large of a sample size. We also checked the differences between synchronized and nonsynchronized cell pairs in figure 4 panels e and h (now figure 5 e, h). These are the same data as i and j but normalized such that all of the distances could be averaged together. We again found statistical significance between synchronized and non-synchronized cell pairs. As can be seen in Author response image 2 the difference between synchronized and non-synchronized cell pairs is greater than the variability between simulated islets. Thus, in this case the variability is not substantial.

      Author response image 2.

      5) The data shown in Fig. 5 for Cx36 knockout islets are used to assess the influence of gap junctional coupling, which is reasonable, but it would be reassuring to know that loss of this gene has no effects on the expression of other genes in the beta cell, especially genes involved with glucose metabolism.

      This is an important point. Previous studies have assessed that no significant change in NAD(P)H is observed in Cx36 deficient islets – see Benninger et al J.Physiol 2011 [14]. Islet architecture is also retained. Further the insulin secretory response of dissociated Cx36 knockout beta cells is the same as that of dissociated wildtype beta cells, further indicating no significant defect in the intrinsic ability of the beta cell to release insulin – see Benninger et al J.Physiol 2011 [14]. We now Mention these findings in the discussion. See Discussion page 14 lines 459-464.

      6) In many places throughout the paper, it is difficult to ascertain whether what is being shown is new vs. what has been shown previously in other studies. The paper would thus benefit strongly from added text highlighting the novelty here and not just restating what is known, for instance, that islets can exhibit small-world network properties. This detracts from the strengths of the paper and further makes it difficult to wade through. Even the finding here that metabolic characteristics of the beta cells can infer profound and influential functional coupling is not new, as the authors proposed as much many years ago. Again, this makes it difficult to distill what is new compared to what is mainly just being confirmed here, albeit using different methods.

      Thank you for the suggestion, we have made significant modifications throughout the Introduction, Discussion and Results to be clearer about what is known from previous work and what is newly found in this manuscript.

      Reviewer #5 (Public Review):

      The authors use state-of-the-art computation, experiment, and current network analysis to try and disaggregate the impact of cellular metabolism driving cellular excitability and structural electrical connections through gap junctions on islet synchronization. They perform interesting simulations with a sophisticated mathematical model and compare them with closely associated experiments. This close association is impressive and is an excellent example of using mathematics to inform experiments and experimental results. The current conclusions, however, appear beyond the results presented. The use of functional connectivity is based on correlated calcium traces but is largely without an understood biophysical mechanism. This work aims to clarify such a mechanism between metabolism and structural connection and comes out on the side of metabolism driving the functional connectivity, but both are required and more nuanced conclusions should be drawn.

      We thank reviewer 5 for their positive comments, including our multifaceted experimental and computational techniques. We also found the reviewers careful reading and thoughtful comments to be very helpful and we have worked to integrate each comment into our manuscript. It is evident from the reviewer comments that we did not clearly explain what was meant by our conclusions concerning the functional network reflecting metabolism rather than gap junctions. We have conducted significant rewriting to show that we are not concluding that communication (metabolic or electric) occurs due to conduits other than gap junctions. Rather, our data suggest that the functional network (which reflects calcium synchronization) reflects intrinsic dynamics of the cells, which include metabolic rates, more than individual gap junction connections.

      References referred to in this response to reviewers document:

      [1] A. Stožer et al., “Functional connectivity in islets of Langerhans from mouse pancreas tissue slices,” PLoS Comput Biol, vol. 9, no. 2, p. e1002923, 2013.

      [2] N. L. Farnsworth, A. Hemmati, M. Pozzoli, and R. K. Benninger, “Fluorescence recovery after photobleaching reveals regulation and distribution of connexin36 gap junction coupling within mouse islets of Langerhans,” The Journal of physiology, vol. 592, no. 20, pp. 4431–4446, 2014.

      [3] C.-L. Lei, J. A. Kellard, M. Hara, J. D. Johnson, B. Rodriguez, and L. J. Briant, “Beta-cell hubs maintain Ca2+ oscillations in human and mouse islet simulations,” Islets, vol. 10, no. 4, pp. 151–167, 2018.

      [4] N. R. Johnston et al., “Beta cell hubs dictate pancreatic islet responses to glucose,” Cell metabolism, vol. 24, no. 3, pp. 389–401, 2016.

      [5] V. Kravets et al., “Functional architecture of pancreatic islets identifies a population of first responder cells that drive the first-phase calcium response,” PLoS Biology, vol. 20, no. 9, p. e3001761, 2022.

      [6] H. Ren et al., “Pancreatic α and β cells are globally phase-locked,” Nature Communications, vol. 13, no. 1, p. 3721, 2022.

      [7] A. Stožer et al., “From Isles of Königsberg to Islets of Langerhans: Examining the function of the endocrine pancreas through network science,” Frontiers in Endocrinology, vol. 13, p. 922640, 2022.

      [8] J. Zmazek et al., “Assessing different temporal scales of calcium dynamics in networks of beta cell populations,” Frontiers in physiology, vol. 12, p. 337, 2021.

      [9] M. E. Corezola do Amaral et al., “Caloric restriction recovers impaired β-cell-β-cell gap junction coupling, calcium oscillation coordination, and insulin secretion in prediabetic mice,” American Journal of Physiology-Endocrinology and Metabolism, vol. 319, no. 4, pp. E709–E720, 2020.

      [10] J. M. Dwulet, J. K. Briggs, and R. K. P. Benninger, “Small subpopulations of beta-cells do not drive islet oscillatory [Ca2+] dynamics via gap junction communication,” PLOS Computational Biology, vol. 17, no. 5, p. e1008948, May 2021, doi: 10.1371/journal.pcbi.1008948.

      [11] B. E. Peercy and A. S. Sherman, “Do oscillations in pancreatic islets require pacemaker cells?,” Journal of Biosciences, vol. 47, no. 1, pp. 1–11, 2022.

      [12] G. A. Rutter, N. Ninov, V. Salem, and D. J. Hodson, “Comment on Satin et al.‘Take me to your leader’: an electrophysiological appraisal of the role of hub cells in pancreatic islets. Diabetes 2020; 69: 830–836,” Diabetes, vol. 69, no. 9, pp. e10–e11, 2020.

      [13] L. S. Satin and P. Rorsman, “Response to comment on satin et al.‘Take me to your leader’: An electrophysiological appraisal of the role of hub cells in pancreatic islets. Diabetes 2020; 69: 830–836,” Diabetes, vol. 69, no. 9, pp. e12–e13, 2020.

      [14] R. K. Benninger, W. S. Head, M. Zhang, L. S. Satin, and D. W. Piston, “Gap junctions and other mechanisms of cell–cell communication regulate basal insulin secretion in the pancreatic islet,” The Journal of physiology, vol. 589, no. 22, pp. 5453–5466, 2011.

      [15] R. Fried, Erectile dysfunction as a cardiovascular impairment. Academic Press, 2014. [16] T. Pipatpolkai, S. Usher, P. J. Stansfeld, and F. M. Ashcroft, “New insights into KATP channel gene mutations and neonatal diabetes mellitus,” Nature Reviews Endocrinology, vol. 16, no. 7, pp. 378–393, 2020.

      [17] A. M. Notary, M. J. Westacott, T. H. Hraha, M. Pozzoli, and R. K. P. Benninger, “Decreases in Gap Junction Coupling Recovers Ca2+ and Insulin Secretion in Neonatal Diabetes Mellitus, Dependent on Beta Cell Heterogeneity and Noise,” PLOS Computational Biology, vol. 12, no. 9, p. e1005116, Sep. 2016, doi: 10.1371/journal.pcbi.1005116.

      [18] J. V. Rocheleau, G. M. Walker, W. S. Head, O. P. McGuinness, and D. W. Piston, “Microfluidic glucose stimulation reveals limited coordination of intracellular Ca2+ activity oscillations in pancreatic islets,” Pro ceedings of the National Academy of Sciences, vol. 101, no. 35, pp. 12899–12903, 2004. [19] R. K. Benninger, M. Zhang, W. S. Head, L. S. Satin, and D. W. Piston, “Gap junction coupling and calcium waves in the pancreatic islet,” Biophysical journal, vol. 95, no. 11, pp. 5048–5061, 2008.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This article is interested in how butterfly, or more precisely, butterfly wing scale precursor cells, each make precisely patterned ultrastructures made of chitin.

      To do this, the authors sought to use the butterfly Parides eurimedes, a papilionid swallowtail, that carries interesting, unusual structures made of 1) vertical ridges, that lack a typical layered stacking arrangement; and 2) deep honeycomb-like pores. These two features make the organism chosen a good point of comparison with previous studies, including classic papers that relied on electronic microscopy (SEM/TEM), and more recent confocal microscopy studies.

      The article shows good microscopy data, including detailed, dense developmental series of staining in the Parides eurimedes model. The mix of cell membrane staining, chitin precursor, and F-actin staining is well utilized and appropriately documented with the help of 3D-SIM, a microscopy technique considered to provide super-resolution (here needed to visualize sub-cellular processes).

      The key message from this article is that F-actin filaments are later repurposed, in papilionid butterflies, to finish the patterning of the inter-ridge space, elaborating new structures (this was not observed so far in other studies and organisms). The model proposed in Figure 6 summarized these findings well, with F-actin reshaping it itself into a tulip that likely pulls down a chitin disk to form honeycombs. These interpretations of the microscopy data are interesting and novel.

      There are two other points of interest, that deserve future investigation:

      1) The authors performed immunolocalizations of Arp2 and pharmacological inhibitions of Arp2/3, and found some possible effect on honeycomb lattice development. The inter-ridge region of the butterfly Papilio polytes, which lacks these structures, did not seem to be affected by drug treatments. Effects where time- dependent, which makes sense. These data provide circumstantial evidence that Arp2/3 is involved in the late role of F-actin formation or re-organisation.

      2) The authors perform a comparative study in additional papilionids (Fig. 6 in particular). I find these data to be quite limited without a dense sampling, but they are nonetheless interesting and support a second-phase role of F-actin re- organisation.

      The article is dense, well produced and succinctly written. I believe this is an interesting and insightful study on a complex process of cell biology, that inspires us to look at basic phenomena in a broader set of organisms.

      We thank the reviewer for the positive appraisal.

      Reviewer #2 (Public Review):

      The manuscript by Seah and Saranathan investigates the cell-based growth mechanism of so called honeycomb-structures in the upper lamina of papilionid wing scales by investigating a number of different species. The authors chose Parides eurimedes as a focus species with the developmental pathway of five other papilionid as a comparative backup. Through state-of-the-art microscopy images of different developmental steps, the author find that the intricate f-actin filaments reorganise, support cuticular discs that template the air holes that form the honeycomb lattice. The manuscript is well written and easy to follow, yet based on a somewhat limited sample size for their focus species, limiting attempts to suppress expression and alter structure shape.

      The fact that the authors find a novel reorganisation mechanism is exciting and warrants further research, e.g. into the formation of other microscale features or smaller scale structures (e.g. the mentioned gyroid networks).

      We thank the reviewer for the positive appraisal.

      The authors place their results in the discussion in the light of current literature (although the references could be expanded further to include the breadth of the field). However, the mechanistic explanation completely ignores the mechanical properties of the membranes as an origin of some of the observed phenomena (see McDougal's work for example) and places the occurence of some features into Turing patterns and Ostwald ripening, which I find somewhat unlikely and I suggest that the authors discover this aspects further in the discussion.

      We thank the reviewer for these suggestions. We have added more references from the current literature to more accurately reflecting the breadth of the field. McDougal et al. 2021. discuss the nature of biomechanical forces (differential growth and buckling) on the membrane and deposited cuticle shaping the formation of longitudinal ridges. However, here it is the invagination of the plasma membrane bearing the deposited cuticle that is our main concern. Nevertheless, we agree future studies should indeed consider the mechanical properties of the membranes, in addition, to explain some of the observed features. We have clarified this in our discussion.

      I have little concerns regarding the experimental approach beyond the somewhat limited sample size. One thing the authors should more clearly mention are the pupation periods for all investigated species as only the periods for two species are named.

      Yes, unfortunately, we were only able to obtain pupae with pupation dates for two species. We have clarified this point in the methods.

      Reviewer #1 (Recommendations For The Authors):

      Suggestion for improvement.

      I recommend adopting a magenta/green (or orange/azure) color scheme to make the figures accessible to most color vision types. This does not require re-doing the figure and could be processed on the rendered JPG/TIF figures with the following procedure :

      1) open the rendered figures in Photoshop in RGB mode

      2) go to Channel Mixer

      3) Select Output Channel : Blue

      4) set Blue 100%-->0% and Red 0-->100%

      This will change Red to Magenta without affecting luminosity.

      Similar solutions should be available in other software including GIMP.

      Of note this is a late fix and ideally, color encoding could be done upstream in the microscopy file extraction software (e.g. Fiji), but I do not think this heavier solution is needed here.

      We thank the reviewer for this suggestion. In order to be more inclusive, we have redone the figures and videos in a yellow+magenta color scheme.

      Reviewer #2 (Recommendations For The Authors):

      References: Some literature is missing that could be considered by the authors e.g.

      https://doi.org/10.1098/rstb.2020.0505 https://doi.org/10.1101/2023.06.01.542791

      https://doi.org/10.1098/rsfs.2011.0082 https://doi.org/10.1557/mrs.2019.21

      https://iopscience.iop.org/article/10.1088/2040- 8986/aaff39/meta https://doi.org/10.1364/OE.20.008877

      We have added more references as suggested.

      Placing the captions next to the figures, particularly in the SI will help accessibility.

      We agree. We believe this would be done during article production.

      113: chiefly?

      We have replaced ‘chiefly’ with ‘focusing mainly on’.

      160: how do you know the scales are more scletorized already? Just because it's later in development?

      Yes, that is what we are alluding to here. We have made edits to clarify this sentence.

      186: Specify sample size.

      We have specified the sample size ‘(N = 15)’ here.

      309: Multilayered cover scales would be more accurate.

      Thanks for the suggestion. We have changed ‘structurally-colored cover scales’ to ‘multilayered cover scales’ as suggested.

      Please check the literature list again for accurate references.

      Thanks for the suggestion. We have gone through the references and fixed any missing information.

    1. Author Response:

      Thank you very much for selecting our paper for peer review and for the thorough evaluation of our manuscript. We appreciate your assessment and the reviewers’ comments that value our work and identify important points that will enable us to improve the paper. We are now working on key experiments to further test the hypothesis that ROCK is essential for the formation, growth, and morphology of the sea urchin larval skeleton. We will address the reviewers’ comments in detail in the revised version of the paper that we will submit after completing the experiments, but for now, there are two points we would like to clarify.

      We thank the first reviewer for the appreciation of this paper and of our previous work where we studied calcium vesicle dynamics in whole embryos (Winter et al, Plos Com Biol 2021). In Winter et al 2021, we found that the skeleton (spicules) doesn’t grow when the embryos are immobilized in either control or treated embryos. As a consequence, we cannot determine the role of ROCK in vesicle trafficking and exocytosis based on experiments conducted in whole embryos. We are developing an alternative assay for vesicle tracking using cell cultures, but that is beyond the scope of this current work.

      As for the second reviewer’s criticism of the usage of Y-27632 to block ROCK activity: The ROCK inhibitor concentrations we used (30-80µM) are similar the those commonly used in mammalian systems and in Drosophila to block ROCK activity, for example: (Becker et al., 2022; Canellas-Socias et al., 2022; Fischer et al., 2009; Kagawa et al., 2022; Segal et al., 2018; Su et al., 2022). The manufactory datasheet indicates that: “Y-27632 dihydrochloride is a selective ROCK inhibitor (Ki values are 0.14-0.22, 0.3, 25, 26 and > 250 μM for ROCK1 (p160 ROCK), ROCK2, PKA, PKC and MLCK respectively)”. That is, the affinities of Y-27632 for ROCK kinases are at least 100 times higher than those for PKC, PKA, and MLCK. Furthermore, these Ki values are based on biochemistry assays where the activity of the inhibitor is tested in-vitro with the purified protein. Therefore, these concentrations are not relevant to cell or embryo cultures where the inhibitor has to penetrate the cells and affect ROCK activity in-vivo. Y-27632 activity was studied both in-vitro and in-vivo in Narumiya, Ishizaki and Ufhata, Methods in Enzymology 2000 (Narumiya et al., 2000). This paper reports similar concentrations to the ones indicated in the manufactory data sheet for the in-vitro experiments, but shows that 10µM concentration or higher are effective in cell cultures. As stated above, we will add additional experimental verifications to the revised version, but even at this stage, the concentrations we used and the agreement between our pharmacological and genetic perturbations suggests that the affected protein is indeed ROCK.

      We share the reviewers and editors wish to identify the molecular targets of ROCK and the specific cellular processes that ROCK is involved in, and we are actively working on achieving this goal. However, we believe that this paper is an important step towards illuminating the cellular components that participate in biomineral construction and the feedback between the cellular machinery and gene expression.

      Best,

      Smadar, in the name of all co-authors.

      References:

      • Becker, K.N., Pettee, K.M., Sugrue, A., Reinard, K.A., Schroeder, J.L., Eisenmann, K.M., 2022. The Cytoskeleton Effectors Rho-Kinase (ROCK) and Mammalian Diaphanous-Related (mDia) Formin Have Dynamic Roles in Tumor Microtube Formation in Invasive Glioblastoma Cells. Cells 11.
      • Canellas-Socias, A., Cortina, C., Hernando-Momblona, X., Palomo-Ponce, S., Mulholland, E.J., Turon, G., Mateo, L., Conti, S., Roman, O., Sevillano, M., Slebe, F., Stork, D., Caballe-Mestres, A., Berenguer-Llergo, A., Alvarez-Varela, A., Fenderico, N., Novellasdemunt, L., Jimenez-Gracia, L., Sipka, T., Bardia, L., Lorden, P., Colombelli, J., Heyn, H., Trepat, X., Tejpar, S., Sancho, E., Tauriello, D.V.F., Leedham, S., Attolini, C.S., Batlle, E., 2022. Metastatic recurrence in colorectal cancer arises from residual EMP1(+) cells. Nature 611, 603-613.
      • Fischer, R.S., Gardel, M., Ma, X., Adelstein, R.S., Waterman, C.M., 2009. Local cortical tension by myosin II guides 3D endothelial cell branching. Curr Biol 19, 260-265.
      • Kagawa, H., Javali, A., Khoei, H.H., Sommer, T.M., Sestini, G., Novatchkova, M., Scholte Op Reimer, Y., Castel, G., Bruneau, A., Maenhoudt, N., Lammers, J., Loubersac, S., Freour, T., Vankelecom, H., David, L., Rivron, N., 2022. Human blastoids model blastocyst development and implantation. Nature 601, 600-605.
      • Narumiya, S., Ishizaki, T., Uehata, M., 2000. Use and properties of ROCK-specific inhibitor Y-27632. Methods Enzymol 325, 273-284.
      • Segal, D., Zaritsky, A., Schejter, E.D., Shilo, B.Z., 2018. Feedback inhibition of actin on Rho mediates content release from large secretory vesicles. J Cell Biol 217, 1815-1826.
      • Su, Y., Huang, H., Luo, T., Zheng, Y., Fan, J., Ren, H., Tang, M., Niu, Z., Wang, C., Wang, Y., Zhang, Z., Liang, J., Ruan, B., Gao, L., Chen, Z., Melino, G., Wang, X., Sun, Q., 2022. Cell-in-cell structure mediates in-cell killing suppressed by CD44. Cell Discov 8, 35.
    1. Author Response

      Reviewer #1 (Public Review):

      In this genetic and imaging based analysis of stem-cell maintenance and organ initiation, two phases important for continued production of shoot organs in plants, the authors tested whether SHR and targets/partners (SCR, SCL23, JKD) provide the circuitry to maintain stem cell pool and contribute to the production of lateral organs. Finding that these factors are indeed expressed in and required for SAM activities, and furthermore, behaviors of SHR and SCR in the root are recapitulated in the meristem, including mobility of SHR (here to epidermis from internal layers), activation of SCR by SHR, and "trapping" of SHR movement by complexing with SCR. Strengths include high quality imaging of reporters and FRET-FLIM measurement to assess in vivo complex formation. The analysis is then extended to link SHR and SCR to shoot-specific factors and auxin, again by testing expression, genetic dependencies and physical interaction. This is repeated for a number of factors and individually, each is well done experiment. Conclusions about causal relationships are somewhat overstated (for example, the idea that SHR-SCR act through CYCD6 to alter cell division is based on expression patterns, not a functional analysis of cycd6).

      We concluded that SHR and cofactors drive cell proliferation through CYCD6;1, substantiated by the significant reduction in pCYCD6;1-GFP expression within the lateral organ primordia of the shr-2 mutant. This decrease in expression corresponds with the reduction in the number of cell layers within the L3 of the lateral organ primordia in shr-2 mutants, compared to wild-type. To further support this conclusion, we have added new data by analyzing the meristem of the cycd6;1 mutant. Our findings reveal a small, but significant reduction in both meristem size and the number of cell layers in the L3, relative to the wild type, as depicted in Fig4-FigSuppl2I-N. Collectively, these findings underscore our assertion that the SHR regulatory network plays a role in activating CYCD6;1 expression, thereby promoting cell division within the lateral organ primordia.

      In general, there are many high-quality studies included in this paper, and the presentation of imaging data (both the images themselves and quantification of data) is excellent. There is also a lot of data, and while each section was presented in a logical way, connections between sections, and the overarching developmental questions were sparse. Because the authors found that many of the relationships defined in the root were recapitulated in the shoot, the present organization leaves one with somewhat of a sense that little new was learned, and yet, the shoot meristem IS different and there are shoot specific inputs into the core regulatory factors. Rewriting to highlight the different activities (and thus expectation about regulation) could make the finding of the same network more interesting and creating a summary figure that highlights the input of shoot specific signals would bring the unique analysis to the forefront.

      We greatly appreciate your positive feedback on the imaging data presentation and the quality of the included studies! We tried to address your and the other reviewer´s comments and strengthened the connections between the different sections of the manuscripts. We made substantial revisions to the organization and presentation of the paper. Our focus has been on highlighting the distinct activities and regulatory aspects of the SHR network within the shoot meristem, underscoring the novel insights gained from this analysis. We also created a summary figure that features the input of shoot-specific signals, thereby emphasizing the unique analysis conducted. These changes have allowed us to better convey the significance of our findings and showcase the novel aspects of shoot meristem regulation. We believe these revisions align more closely with the paper's objectives and will make the study's contributions more engaging and apparent.

      Reviewer #2 (Public Review):

      This study contains a huge amount of data and the images are of high quality. However, the conclusions are not really well supported. The authors may have reached too far from their results. The roles of SHR, SCR and SCL23 in the shoot apex are not really clarified. The manuscript by Bahafid et al., reports a study of the functions of SHORTROOT (SHR), a well-established root development regulator in the shoot apical meristem (SAM) development with focus on lateral organ initiation. A large amount of data is included in this paper. This study highly depends on imaging, and the images are in general of very good quality. The authors show reciprocal interactions between SHR and SCR with auxin/MP. There are also a large amount of genetic interactions among several genes, including WUS and CLV3. Although the study provides a vast amount of data, the conclusions are not so well supported. There seem to be many interactions, at the protein level, and at the transcriptional regulation level, but the conclusion is nevertheless ambiguous.

      We have refined our manuscript.

    1. Author Response

      Evaluation Summary:

      The manuscript shows that retinal ganglion cell light responses in awake mice differ substantially from those under two forms for anesthesia and previously attained ex vivo recordings. This difference is central to our understanding of how ganglion cell responses relate to behavior. There are a few technical issues and issues about how the work is presented that could be strengthened.

      We thank the reviewers for their constructive comments. We have addressed all the issues, and added substantially more data and analysis results in the revised manuscript, further supporting our findings that awake responses are larger, faster, and more linearly decodable in the mouse retina than those responses under anesthesia or ex vivo.

      Reviewer #1 (Public Review):

      This paper compares output signals from the mouse retina in three conditions: awake mice, anaesthetized mice, and isolated retinas. The paper reports substantial differences, particularly between awake and either of the other conditions. Retinal signaling has been well studied using ex vivo preparations, with an assumption that the findings from those studies can be carried over to how the retina operates in vivo. The results from this paper at a minimum indicate a need to be cautious about that assumption. There are several technical issues that need testing or further explanation, and several issues about the presentation that could be clarified.

      Spike sorting

      The paper does not describe any control analyses that test for contamination in spike sorting. These are needed to evaluate the work.

      We have reported the details of our spike sorting procedure in the revised manuscript (Data Analysis section in Methods and Figure 1). In short, single-units were identified by clustering in principal component space, followed by manual inspection of spike waveform (triphasic as expected from axonal signals; e.g., revised Figure 1F-H; Barry, 2015) as well as auto- and cross-correlograms (minimal inter-spike interval above 1 ms for a refractory period; e.g., revised Figure 1I-K). A small fraction of visually responsive cells (20/282, awake; 21/325, isoflurane; 1/103, FMM) had a small fraction of interspike intervals below 2 ms; but, whether or not including them in the analysis did not affect our main conclusions.

      Light levels

      The paper argues that differences in light level cannot account for the results. According to the methods, light levels were about two-fold higher at the retina in array recordings as compared to the front of the eye for in vivo recordings. The main text indicates that they differ less, it's not clear why the numbers in the main text and methods are different. Aside from this issue, this comparison does not consider the loss of light between the front of the eye and the retina. It is crucial that the paper provide a more detailed description of light levels. This should include converting those light levels to units that include the spectral output of the light source used (e.g. to isomerizations per rod or cone per second).

      The maximum light intensity of our in vivo setup was 31.3 mW/m2 (with 15.9 mW for UV LED and 15.4 mW/m2 for blue LED). Following the suggestion by the reviewer, we calculated the photon flux on the mouse retina in vivo by taking into account the loss of light by the eye optics. In short, assuming 50% and 68% transmittance at 365 nm and 454 nm, respectively (Jacobs & Williams 2007), the pupil size of 1 mm and the retinal diameter of 4 mm with the stimulus covering 73° in azimuth and 44° in elevation, we obtained the photon flux on the mouse retina in vivo as 3.81×103 and 6.64×103 photons/s/μm2 for UV and blue light, respectively. Assuming a total photon collecting area of 0.2 μm² for cones and 0.5 μm² for rods (Nikonov et al. 2006), and a relative sensitivity of rods, S- and M-cones to be [UV, Blue]=[25, 60], [90, 0], [25, 60]%, respectively (Jacobs & Williams 2007), we then estimated the photoisomerization (R) rate as: 2.5×103 R/rod/s, 0.7×103 R/S-cone/s, and 1.0×103 R/M-cone/s.

      In contrast, the maximum light intensity of the in vitro set up was 36 mW/m2 as reported in Vlasiuk and Asari (2021). The photon flux on the isolated retina was then estimated to be around 9×104 photons/s/μm2 (under the assumption that the white light from a CRT monitor is centered around 500 nm). Assuming the sensitivity of rods, S- and M-cones to be 40, 2 and 40%, respectively, we then obtained 4×104 R/rod/s, 2×103 R/S-cone/s, and 4×104 R*/Scone/s.

      Thus, the light intensity level was about ten times larger for the in vitro recordings than for the in vivo recordings. The amount of light reaching the retina in the awake condition should also be somewhat smaller than that under anesthesia due to pupillary reflexes. Past studies suggest that the darker the stimulus is, the slower the kinetics is and the smaller the response is for RGCs in an isolated retina (Wang et al 2011). Thus, the light intensity difference cannot simply account for the higher firing and faster kinetics in the awake condition than ex vivo or in the anesthestized condition.

      We have revised the manuscript accordingly.

      Comparison with other work

      The authors accurately point out that there is not much prior work on retinal outputs in awake animals. The paper, however, minimally describes the work that does exist. The Hong et al. (2018) paper, in particular, should be discussed. There are several differences between the results of that paper and the present paper. These include the fraction of recorded cells that are DS cells, and the maintained firing rates (though this does not appear to be studied systematically in Hong et al.).

      In the discussion section of the revised manuscript, we clarified connections to the existing studies on the retinal activity in vivo. To our knowledge, none of the past studies provided descriptive statistics on the awake RGC response properties (Hong et al., 2018; Schroeder et al., 2020; Sibille et al., 2022). Nevertheless, consistent with our study, we can see high baseline activity in the reported examples from C57BL6 mice (Figure 3C, Schroeder et al. 2020; Figure S7h, Sibille et al. 2022).

      Hong et al (2018), in contrast, reported somewhat different as pointed out by the reviewer. Firstly, they found a relatively low baseline activity in RGCs of albino CD1 mice. We think that this is likely due to general impairments of the vision/retina associated with albinism. While equipped with normal electroretinogram signals, CD1 mice showed no optomotor response and a reduced number of rods (Abdeljalil et al 2005; Brown et al 2007). This suggests a certain level of retinal dysfunction in these mice. Secondly, Hong et al (2018) reported a higher fraction of direction-selective RGCs in their recordings (>50% at a DS index threshold of 0.3). This is even higher than one would expect from anatomical and physiological studies ex vivo on BL6 mice (about a third; Sanes and Masland, 2015; Baden et al., 2016; Jouty et al 2013). Besides the effect of albinism, we think that this overrepresentation of DS cells in Hong et al (2018) arose as a consequence of the low baseline activity. As discussed above, the higher the baseline activity, the lower the DS/OS index by definition (Eq.(3) in Methods). Indeed we found much more cells with high DS/OS index values in our anesthetized data than in awake ones (42-54% vs 17% at an index value threshold of 0.15; Figure 2), even though these recordings were done in the same experimental set up.

      A related issue is that there are a few comparisons of ex vivo RGC responses with behavioral sensitivity. Smeds et al. (2019) is one example. More generally, the long-standing observation that dark-adapted sensitivity approaches limits set by Poisson fluctuations in photon absorption, and that prior RGC measurements are consistent with this result, is hard to explain if the RGCs are firing at high spontaneous rates under these conditions. RGC responses will certainly change with light level, but this merits discussion in the paper.

      As the reviewer pointed out, the retina may employ different coding principles under different light levels. In a scotopic condition, ex vivo studies reported a high tonic firing rate for OFF RGC types (~50 Hz, OFF sustained alpha cells in mice; Smeds et al 2019; ~20 Hz, OFF parasol cells in primates; Ala-Laurila and Rieke, 2014), while a low tonic firing for ON cell types (<1Hz for both ON sustained alpha in mice and ON parasol in primates). These ON cells were shown to be responsible for light detection by firing in the silent background, hence compatible with the sparse feature detection strategy. In contrast, our recordings were done in a high mesopic / low photopic range where both rods and cones are supposedly active. Unlike the scotopic condition with rod vision, we then found high firing in awake recordings in general, indicating that no visual feature can be readily detectable as brief firing events in the silent background. To explore the implications of such firing patterns on visual coding, we took a modelling approach in the revised manuscript. We found that a latency-based temporal code was not preferable in the awake condition (Figure 7); and that a linear decoder worked significantly better with the population responses in the awake condition to capture the presented random fluctuation of the light intensity (Figure 8). While we have not tested any behavioural relevance in our study besides correlation to locomotion/pupil size, it is then possible that the retina may work in different modes under different light intensity regimes (Tikidji-Hamburyan et al 2015).

      We clarified these points in the revised discussion section.

      Sampling bias

      The paper argues that sampling bias is not likely to contribute substantially to the results because of the wide variety of cell types recorded (line 431). This does not seem like a particularly strong argument, especially given the large degree of overlap in the distributions of most quantities across preparations. The argument about many cell types could be made more strongly if the distributions were completely separated, but that is not the case.

      We cannot deny the presence of a sampling bias in our datasets, and as the reviewer pointed out, we made comparisons only at a population level, but not at the level of individual cells or cell-types. However, the anesthetized and awake recordings were done with the same recording setup and techniques, and thus subject to the same sampling bias. Hence, the difference in the RGC response properties between these conditions cannot be explained by the sampling bias per se.

      Sensitivity

      The firing rates in response to 10% contrast sinusoids are quite low, as are the maximal firing rates for high contrast sinusoids. Relatedly, the modulation produced by the noise stimuli, particularly for the array recordings, is weak. This raises concerns about the health of some of the preparations.

      To our knowledge, in vivo contrast responses reported here were comparable to ex vivo data in previous reports (mouse, Jouty et al 2018, Pearson and Kerschensteiner 2015; rat, Jensen 2017, 2019). Likewise, the static nonlinearity and its upper bound for ex vivo responses were comparable between this study and previous reports (Santina et al. 2013; Kerschensteiner et al 2008; Cantrell et al 2010; Trapani et al 2022).

      We also examined batch effects in the response to the noise stimuli. We found certain variabilities across preparations in each recording condition, but not to the extent to discard any particular data as an obvious outlier (Figure 6 – figure supplement 1). While it is difficult to tell the health status of preparations retrospectively, we thus believe that the effects were negligible.

      Efficient coding

      Sparse firing is not a universal property of retinal ganglion cell responses. Primate midget RGCs, for example, have pretty high maintained firing rates as shown in many past studies. Mouse RGCs have also been reported to operate in a mode similar to the high firing rate On cells reported here (Ke et al. 2014). A more balanced discussion of this past work is needed.

      As the reviewer pointed out, some retinal ganglion cells show high firing under certain conditions. In a scotopic condition, for example, OFF cells have high firing rates, while ON cells fire virtually nothing unless a light stimulus is presented (Ke et al 2014; Smeds et al 2019). At the behavoural level, a single-photon detection above chance level nevertheless relies on the information from the ON but not the OFF pathway (Smeds et al 2019). Thus, the sparse coding framework still works as a valid strategy here, if not universal.

      This is, however, very different from what we report here. In a high-mesopic/low-photopic light level, we found a general increase of firing across all cell categories in the awake condition, compared to the anesthetized or ex vivo recordings (Figures 3 and 6). While this lowers information transfer rate (bits/spike; Figure 7), we found that the awake responses were more linearly decodable than the responses in the other conditions (Figure 8). We also ran a simulation and showed that a latency-based temporal code is not preferable for the awake responses (Figure 7 – figure supplement 1). These results suggest that the retina in awake condition is in favor of a rate code, though we have not tested all light levels or any behavioural relevance here.

      We clarified these points in the revised manuscript.

      Role of eye movements

      Could eye movements be at least partially responsible for the differences in response properties? Specifically, small fixational eye movements might produce a constantly varying input that could modulate firing.

      As described above (Essential Review item #2), eye movements were rarely observed during the head-fixed awake recordings. Eliminating those events from the analysis did not change our overall conclusions, and thus their contributions should be minimal in this study. It should also be noted that we mainly used full-field stimulation, and thus microsaccades should not substantially affect the amount of light impinging on the retina. We clarified these points in the revised manuscript.

      Reviewer #2 (Public Review):

      The technical achievements presented in the manuscript represent a tour de force, as optical tract recordings in awake mice have only rarely been done before. The substantial number of neurons recorded in both awake and anaesthetized conditions form a precious and worldwide unique dataset. However, since the recordings represent a non-standard approach, it would be, in my view, highly beneficial to show more details about the success of the method. How did the authors post-hoc identify electrode contacts located in the optical tract, how did the spike waveforms look like, what were the metrics of spike sorting quality, etc.

      We added more details about our recording and analysis methods in the revised manuscript. Below are answers to the reviewer’s specific questions:

      • The probe was coated with a fluorescent dye (DiI stain) and its location was verified histologically after the recordings (Figure 1E).

      • Spike waveforms typically had a triphasic shape (e.g., Figure 1F-H) as expected from axonal signals (Barry, 2015).

      • Single-units were identified by clustering in principal component space, followed by manual inspection of spike shape as well as auto- and cross-correlograms. Most units had a minimum interspike interval above 2 ms (93%, awake; 94%, isoflurane; 99%, FMM); and no units had the interspike intervals below 1 ms for a refractory period (e.g., Figure 1I-K), except for 1 (out of 103) for FMM-anesthetized recordings.

      We then selected visually responsive cells (SNR>0.15; see Eq.(1) in Methods) for the analyses.

      The authors go a long way in characterising the functional response properties of the recorded neurons and relating them to previous ex-vivo recordings. Based on the responses they find, the authors claim that they identified "... a new response type [which] likely emerged due to high baseline firing in awake mice". Regarding this claim, how do the authors rule out that it corresponds to any of the previously described cell types? For instance, the very sharp transient or brief modulations by the contrast part of the stimulus might have been missed in previous classifications based on calcium responses (e.g. Baden et al. 2016), where a number of cell types seem to respond equally strong to grey and white and have an elevated response throughout the sinusoidal modulation of contrast. I acknowledge that the authors touch upon the possibility that the newly described OFFsuppressive ON cells correspond to a known cell type in the discussion, but I would recommend changing the phrasing of the results to avoid potential misunderstandings.

      We agreed with the reviewer and revised the manuscript accordingly. Here we have two possibilities. Firstly, as the reviewer pointed out, this kind of response dynamics could be overlooked previously because of a difference in the recording modality (Ca imaging; Baden et al 2016) or clustering methods (Jouty et al 2019). Secondly, these cells may belong to one of the cell-types described in the past ex vivo studies, but exhibited distinct response dynamics in vivo as an emerging property of the awake condition. This is an interesting topic to pursue in future studies.

      The manuscript makes the interesting suggestion that "the retinal output characteristics [...] observed in vivo, [...] provide a completely different view on the retinal code". Given that this conclusion would change the way we should think about and do retinal neuroscience, in my view, the authors should take a few more steps to quantitatively demonstrate the implications of their findings on retinal coding, e.g. how much lower is the information transmitted per spike, how much does a temporal code based on spike timing suffer with the latencies observed in vivo. If the authors could quantify through computational modelling approaches the consequences of the observed differences, they might also be able to revise their title / main message, i.e. that "Awake responses SUGGEST inefficient dense coding in the mouse retina".

      To explore functional implications of our findings, we performed three more analyses as suggested by the reviewer. Specifically,

      1) We showed that the information transmitted per spike was significantly lower in awake condition, while the total information rate was comparable (Figure 7).

      2) We tested the performance of a linear decoder applied on the firing rate in response to full-field noise, and showed that it worked significantly better for the awake population responses (Figure 8).

      3) We simulated RGC responses to a full-field contrast change at different intensities in different conditions, and showed that a latency coding did not work well with awake responses, compared to ex vivo or anesthetized responses (Figure 7 – figure supplement 1).

      These results strengthened our conclusion that awake response dynamics were different from anesthetized or ex vivo responses, all arguing against the sparse efficient coding principles at least at a light level we examined. We nevertheless kept the title as is because we have not explored the retinal coding properties per se. Our main claim stays on the visual response characteristics of retinal outputs in awake mice.

      Reviewer #3 (Public Review):

      The manuscript by Boissonnet, Tripodi, and Asari compares retinal ganglion cell (RGC) light responses in awake mice (recorded in the optic nerve) with those under two forms for anaesthesia and previously attained ex vivo recordings. This is a well motivated study looking at a question that is really critical to the field.

      The presentation is generally clear and compelling. My suggestions are relatively minor and aimed at improving an already very strong article.

      1) More cells in the awake condition would help strenghten the conclusions. Only 51 cells are reported, and mouse RGCs comprise more than 40 different types. The authors are well aware of the possible confound of sampling bias, and the best way to mitigate this issue in this experimental paradigm is simply to record more cells. The anesthsia conditions each have about 100 cells, which is better.

      We made substantially more recordings in the awake condition, reaching 282 cells (in 15 animals) in total in the revised manuscript. This does not yet allow for a full cell-type classification as in the past ex vivo studies. Nevertheless, we did our best to broadly classify visual responses, and showed that the overall conclusions remained the same: awake RGCs had higher baseline firing and faster response kinetics in general. For details, see above our response to the Essential Revision item #1.

      2) It took me longer than it should have (had to look up the previous paper cited) to figure out that the ex vivo comparison data were recorded at 37{degree sign}C. This is an important detail since most ex vivo recordings are at 32{degree sign}C. The authors should make this clear in the text and perhaps say something in the Discussion about comparisons to the larger body of literature of ex vivo studies at 32{degree sign}.

      We are aware that most ex vivo studies on the retina were performed at 32 °C, which is lower than physiological body temperature (37 °C). However, the temperature of the ocular surface is around 37 °C (Vogel et al 2016), suggesting that the retina should operate at 37 °C in vivo. This is why we decided to perform ex vivo experiments at 37 °C in our previous study (Vlasiuk and Asari, 2021), allowing us to make a fair comparison between the ex vivo and in vivo recordings.

      We clarified the point in the revised manuscript.

      3) Direction and orientation selectivity should be separated in Fig. 2 and not combined into the confusing term "motion sensitive." Motion sensitivity has another meaning in the literature for RGCs that respond preferentially to moving over static stimuli without direction or orientation preference (Kuo et al., 2016; Manookin et al., 2018)

      We agree with the reviewer. In the revised manuscript, we separated the direction and orientation selective cells (Figure 2), and avoided the term “motion sensitive.”

      4) While I am certainly sympathetic to the argument that the RGC spike code is "inefficient" in the sense that it does not conform to efficient coding theory (ETC), I think it's oversimplified to claim that the present data is a key argument against ETC. Plenty of ex vivo data has already shown ETC to be incomplete at best, and misguided at worst, since it includes the implicit assumption that image reconstruction is the retina's objective function (or even that the experimenter has any idea what that objective function is). For example, OFF sustained alpha (OFF delta in guinea pig) RGCs are not quite sparse feature detectors even ex vivo, and they seem to be optimized to transmit contrast with high SNR (Homann and Freed, 2017). In general, the enormous coverage factor of the RGC population seems to make ETC untenable to begin with, as discussed in (Schwartz, 2021) and elsewhere. I realize that there are still people attached to simplistic forms of ETC as a key principle of retinal computatiion, so I am not asking for the authors to completely remove this angle. Rather, a more nuanced treatment of the issue both in the introduction and the discussion is warranted.

      We totally agree that we are not the first to argue against the efficient coding principles in the retina (Schwartz, 2021). The main argument in this study is that certain aspects of the RGC activity are distinct in an awake condition, such as the baseline firing and response kinetics, and thus we cannot simply translate our knowledge obtained from ex vivo studies into awake animals. To explore the implications on retinal computations, we showed in the revised manuscript that 1) awake responses have a comparable total information transfer rate (in bits per second; Figure 7A) but are less efficient (i.e., lower bits per spikes; Figure 7B); 2) awake responses are not in favor of a latency-based temporal code (Figure 7 – figure supplement 1); and 3) a linear decoder worked significantly better with awake responses (Figure 8), even though an image reconstruction is not necessarily the objective function of the retina. These results point out a need to rethink about retinal function in vivo, including the efficient coding theory.

      We thank the reviewer for the suggestion, and revised the manuscript accordingly.

      References

      Homann, J., and Freed, M.A. (2017). A mammalian retinal ganglion cell implements a neuronal computation that maximizes the SNR of its postsynaptic currents. Journal of Neuroscience 37, 1468-1478.

      Kuo, S.P., Schwartz, G.W., and Rieke, F. (2016). Nonlinear Spatiotemporal Integration by Electrical and Chemical Synapses in the Retina. Neuron 90, 320-332.

      Manookin, M.B., Patterson, S.S., and Linehan, C.M. (2018). Neural Mechanisms Mediating Motion Sensitivity in Parasol Ganglion Cells of the Primate Retina. Neuron 97, 13271340.e4. Schwartz, G.W. (2021). Retinal Computation (Academic Press).

    1. Author Response

      Reviewer #1 Public Review:

      In this manuscript, Berne et al apply state-of-the-art methodology for quantifying animal behavior to identify distinct behavioral components associated with the repeated application of mechanical stimuli. A central strength of this manuscript is the development of a sophisticated system for precisely applying mechanical stimuli and measuring behavior. This is a significant advance over commonly used approaches and has the potential to broadly impact the field. I have some concerns about the methods used to define discrete behaviors and the interpretations drawn from them (see point 2), the opposing phenotypes of memory mutants, and the circuit modeling. However, the overall results provide strong evidence that a small set of behaviors reflect the intensity of response to stimuli, and these combine to reflect an overall complex behavioral response to mechanical stimuli. Overall the manuscript is well written, and clearly communicates results. The level of analysis has the potential to broadly impact many fields examining innate and learned responses to sensory stimuli.

      1) A central strength of this manuscript is the resolution of behavioral analysis. Implicit in this is the potential to use a wealth of genetic analysis and sophisticated genetic tools to dissect the neural basis of these behaviors. These implications would be clearer if the introduction provided more description of this literature.

      This is certainly true, where the findings from behavior experiments should lead to interesting investigations at the neural circuit level. This is especially true for Drosophila, which has a wealth of genetic tools readily available. We have added a new paragraph at the end of the Introduction section to discuss this, and provide citations to a number of commonly used tools that could be used to identify and characterize the circuit side of mechano-sensation and adaptation in flies.

      2) It is unclear how the 4 discrete behaviors were decided upon, and whether there are rarer behaviors, or subcategories within them (for example, sideways crawl).

      We do list a number of behaviors in the third paragraph of the Introduction, and describe some of these in more detail in the next paragraph, but agree that a clearer justification needs to be given for focusing on the four specific behaviors in the paper. The answer is that these are the only behaviors that larvae perform given the constraints we place on their movement (hard, flat agar gel), and because we avoid overly strong stimuli that would cause more drastic pain responses. This is now noted directly near the end of the 5th paragraph of the Introduction.

      3) From figure 1A it looks like the mechanical transducer remains in the center independently of where the larvae is. Could it be possible that subtle differences in mechanical force are detected across the arena and this impacts the response? Does the degree of turning matter?

      While the first paragraph of the Results section notes we use a “customized platform,” and the details and purpose of this are listed later in the second paragraph of Materials and Methods, I think it is warranted to include more details up front, as many readers will likely have the same question. We now clearly state what is customized about the platform and that its purpose is to achieve a spatially uniform vibration stimulus, and point the reader to Materials and Methods for further details.

      4) I am not clear about the application of statistics. For example, 2D states that as a general trend, increasing vibration also increases reversals. I can see this, clearly but is there reason not to run statistics on these data?

      We agree, it is not sufficient to simply state there is a general trend, when statistics can be readily applied (especially to binary/fractional data like this!). We have performed statistical comparison tests for reverse crawling response probabilities in the data in Figure 2C, which shows fractional behavior usage for a wide range of vibration frequency and acceleration. We show the statistics in two ways. (1) Adjacent graphs are connected with bridging lines that are black (p>0.05) or yellow (p<0.05) (Fisher’s exact test for both), which shows the onset of significant reverse crawling behavior when looking along gamma or f axes. (2) Each of the 29 graphs was tested against the baseline (zero vibration) reverse crawl fraction, and red dots indicate significant reverse crawl use. The graphs and captions for Figure 2C have been updated accordingly.

      We also did more serious statistics with the data in Figure 5 (habituation model compared to data) and Figure 7 (simple circuit model compared to data), and those are described below with their associated comments.

      5) The importance of vibration behavior in research is discussed but the ecological relevance of these behaviors is not described.

      A very good idea for setting the context better. We have added a new paragraph to the Introduction with 56 references for readers interested in learning more about this side of things. Vibration response is important in real larvae in nature too, it helps them communicate and avoid predators.

      6) The results of habituation times in mutants are not clear to me. One might predict dnc and rut would have the same phenotype but they have opposing phenotypes with rut being a super-habituate.

      The dnc and rut mutants both desensitize faster than the CS control larvae (comparing the traces in Fig6A to the gray wild type version), which would agree with this prediction, but the details are still finer details to sort out. For example that rut is faster than dnc, or that rut is faster at both desensitizing and re-sensitizing than wild type, but dnc is slow to re-sensitize. This would be interesting to piece together, but for now the mutant results highlight the importance of extracting the finer details (and multiple time constants) involved in vibration response, and explaining why the mutants (or other future strains tested) have the specific values is a bit beyond the scope of this paper.

      We have noted the comparisons with dnc and rut more directly in the text now, accompanying the descriptions of Fig. 6A and 6B in the Results section.

      7) I appreciate the application of circuit modeling, but it would seem that this would be strengthened by including what is already known about the biological circuit.

      We were not very clear about describing the purpose of the circuit model – we did not intend the circuit components of the model to directly match the actual neural circuit elements. It is primarily a visualization tool for what appears to be happening based on the empirical results (although the math behind the circuit might suggest some possible real mechanisms, noted in Discussion). In earlier drafts the visualization tool was a water bucket pouring into a second bucket with a hole in the bottom, with water volume analogous to habituation (the math was identical to the capacitor circuit). We have added a sentence at the beginning of the circuit model section to clarify its purpose better.

      That said, we agree it is important to discuss the context of the real neural circuit. This was in the Introduction already, but not emphasized or introduced very well. This section now has its own paragraph, which we have expanded and added additional references (paragraph starting with “Some aspects of the neural circuitry…”).

      We have also substantially edited the Results section about the circuit model in response to other comments below, and it should be more focused and clearer now.

      Reviewer #2 (Public Review):

      Berne et al. establish the responses of Drosophila larvae to mechanical vibrations as a novel paradigm to study habituation. The authors first comprehensively quantify the different types of locomotor responses to vibrations and find that larvae respond to faster and stronger vibrations with more avoidance-type behaviors, like pauses, turns, and reversals. The authors then combine genetic and computational methods to characterize the strong de-sensitization of avoidance responses to vibrations. De-sensitization of reversals follows a simple, exponential decay with a single time constant. By contrast, re-sensitization dynamics are more complex and strongly accelerate after repeated exposure to a vibration stimulus. The authors then test mutants for genes involved in learning and memory (rut, dnc, cam) and find altered desensitization and re-sensitization dynamics, suggesting that these genes mediate this behavior. Finally, a simple and intuitive electrical circuit model is used to explain these complex dynamics results. Overall, the results are interesting and they successfully combine behavioral characterization, genetic manipulations, and computational modeling to explain the behavior.

      The analyses are all sound and support most of the conclusions but additional control experiments and analyses are required.

      1) To convincingly show that the computational models capture the key aspects of the behavior and therefore provide insight into the underlying phenomenon, model predictions and behavioral data need to be compared systematically and quantitatively. This is not sufficiently done for the electrical circuit model, and the analyses shown in Fig. 7C need to be extended. The model should be fitted to the data and the match between model and data should be A) quantified using a suitable measure of goodness-of-fit and B) illustrated by overlaying behavioral data and model predictions.

      We agree, and thank the referee for pointing this out. The circuit model was intended as primarily a visualization tool, but it was not fair of us to say that it correctly predicts anything real without being more precise and quantitative, including using significance metrics. We also feel that Fig. 7C was not a very compelling demonstration and not very interesting. We have replaced 7C with a new panel that shows empirical reverse crawl probability overlayed with the circuit model’s prediction of reverse crawl behavior (where FREV ~ exp(-Q2). The peak values match very closely, although the overall shape does not, due to the simplicity of the model. This is discussed fully in the Results text and in a redone Fig. 7 caption.

      Moreover, the contribution of individual circuit elements should be quantified, for instance by removing key elements from the model like the second capacitor. If a good quantitative fit is for some reason hard to obtain, then more effort should be spent to demonstrate a good qualitative agreement between model and data.

      We have shown what we think is the bare minimum circuit model that can include the accumulation and decay of a substance (the charge Q2 standing in for “habituation”). We could have built a more complicated circuit and essentially forced it to have the same time constants as we extracted from data, but felt that would lose sight of its appeal as a visualization tool and qualitative idea. We could not remove C2, for example, because the “output” of the circuit model itself is the charge on that capacitor.

      In response to further comments below we have overhauled and simplified the section about the circuit model, and hope this also helps alleviate any concerns.

      The same goes for the phenomenological model in Fig. 5. Predictions of model variants with a constant re-sensitization time constant and a time constant that changes with pulse number should be shown and their fit to the data should be quantified.

      Absolutely. We have added two other versions of the model to Fig. 5E (one with only desensitization and the other that doesn’t have the time constant changing with pulse number) and performed significance tests on the peak values for each pulse response. The model with all three aspects of habituation performs the best. Fig. 5E has been made larger to better see the traces, we have added visual cues and a legend for the significance tests, and the caption has been expanded accordingly.

      2) The Markov model in Fig. 3 is used to state that habituation is a one-way process from reversals to other behaviors, with only rare transitions back to reversals. However, the low transition rates to reversals (Fig. 3) seem at odds with the fast re-sensitization after repeated stimulation (Fig. 5). This should be explained and both results should be linked.

      This is a really good observation, and fortunately does have an explanation. The assigned behaviors in Fig. 3 are what we observe during the first 3 seconds after vibration onset. Habituation sets in as the stimulus stays on, then re-sensitization (even if not complete) occurs while the stimulus is off. Then when the stimulus turns on again, we assign the next behavior. An individual with a strong (reversal) response will most often (85% of the time) reverse again the next time the stimulus turns on. We would not classify that as a transition back to reversal, but as a repeat of the reversal behavior following de-sensitization and resensitization. For the 15% of individuals that did not reverse the second time, they will only very rarely (< 2%) reverse the third time. The re-sensitization process in fact explains why strong response behaviors so often repeat for the next vibration pulse response.

      We have expanded a paragraph in the Results section to add text similar to what we have written here to clear up this point. It’s the last paragraph in the “Re-sensitization rates increase…” subsection.

      3) Based on altered de-sensitization and re-sensitization dynamics in mutants, the authors claim that three different genes - rut, dnc, cam - are involved in the molecular pathway that mediates habituation of larval locomotor responses to vibrations. This is interesting and deserves further study. However, it is unclear whether the observed effects are specific to the genes that were altered or whether the effects stem from differences in the genetic background across the mutants. This could be resolved in two ways: Ideally, with rescue experiments; if this is not feasible, then data from different wild-type strains could be used to show that the de-sensitization and re-sensitization dynamics are similar across wild types and somewhat robust to genetic background.

      Additional control data with other wild type strains was not doable due to personnel issues noted in our resubmission letter, and also time constraints (for example, each trace like the one in Fig. 5A requires 1000 animals to construct – we suspect that the required number of larva-hours to determine habituation parameters is a large part of why other researchers have not observed these habituation characteristics in larvae before). We do acknowledge this limitation directly in the manuscript now, and highlight why it would be important for further experiments like these to be carried out in the future. A new paragraph in the “Conclusions” subsection of Discussion discusses this. We now state directly that the mutant results are there to highlight the importance of characterizing multiple time constants and other dependencies when determining anything about habituation. The fact that habituation parameters are not the same as this particular CS wild type is suggestive, but given the lack of additional controls it would not be fair to make specific statements about any of the mutants at this stage.

    1. Author Response

      Reviewer #1 (Public Review):

      1) Comment: To determine the effect of diseased monocytes on retinal health, light-injured mouse retinas were injected with monocytes isolated from AMD patients (Figure 1 - figure supplement 1). This resulted in a reduction in photoreceptor number and ERG b-wave amplitude. However, the light-injured control eye was injected with PBS only, so no cells were present. The reasoning for using this control was not provided. The appropriate injection control would include monocytes isolated from non-AMD patients. This control should be performed side-by-side with cells from AMD patients.

      We thank the reviewer for this important comment. The purpose of the current study was to identify the macrophage subtype that may be associated with cell death in aAMD. We have previously reported that macrophages from AMD patient demonstrate a different phenotype compared with healthy patient in the rodent model for laser induced CNV (Hagbi-Levi S et al, 2016). Per the reviewer comment, we have performed additional experiments to assess the effect of monocytes from healthy controls in the photic retinal injury model. Results showed that monocytes from AMD and healthy patients exert different impact on the retina in this rodent model for aAMD. Interestingly, we found that monocytes from healthy patients were more neurotoxic to photoreceptors compared with monocytes from AMD patients. These results are included in the revised ms. as Figure 1- figure supplement 1H. A possible explanation for these findings is discussed in lines 179-190 of the revised manuscript. This finding reinforces the idea that the use of monocytes from AMD patients in the experiments is required to obtain a comprehensive understanding of their involvement in the progression of the disease.

      2) Comment: The authors hypothesize, from the experiments presented in Figure 1 - figure supplement 1, that the injected monocytes generated macrophages in the retina, which were responsible for the observed neurotoxicity (Lines 143-145). However, no direct evidence was presented. This idea should be tested in vivo. This could be done by injecting tracer-labeled human AMD-derived monocytes into light-injured mouse retinas. If the authors' hypothesis is true, collected retinas should contain tracer-labeled cells that express macrophage markers. Tracer-labeled M2a macrophage cells should be present since subsequent experiments identify this subclass as being associated with retinal cell death.

      Thank you for this important comment. To address the reviewers comment, retinal section from mice exposed to photic-retinal injury and injected with Dio-tracer labelled monocytes were stained with two M2a macrophages markers, CD206 (mannose receptor) and VEGF (Kadomoto, S et al, 2022; Jayasingam SD et al, 2019). Interestingly, we found co-localization of Dio-tracer staining (representing the injected human macrophages) with CD206 and VEGF markers in monocytes localized in different retinal layers, but not in monocytes remaining in the vitreous cavity. These data indicate that M2a markers are expressed during the polarization of monocytes into M2a phenotype which is maintained only upon entry into the retina tissue. These results were included in Figure 1- figure supplement 1K-S and discussed in the revised manuscript in lines 179-182.

      3) Comment: Photoreceptor number and b-wave amplitudes were measured in light-injured retinas injected with one of four macrophage cell types generated from human AMD-derived monocytes. The authors conclude that only injection of M2a cells reduced photoreceptor number and b-wave amplitudes (Figure 1C, E). This may be true, but it is difficult for the reader to make a conclusion (especially in Fig. 1E) due to the large error bars and five different traces overlapping each other. To make these results easier to interpret, graph control cells with only one experimental sample (cell type) at a time.

      Thank you for this comment. Per the reviewer comment, the graphs were modified in the revised ms. (Figure 1, panel H-K).

      4) Comment: Most injected macrophages were located in the vitreous. In the case of M2a cells, the authors note that "several of the cells migrated across the retinal layers reaching the subretinal space" (Lines 167,168). One possible explanation for why M0, M1, and M2c macrophages did not induce retinal degeneration is that they did not migrate to the subretinal space and around the optic nerve head. Supplementary figures should be added to demonstrate that this is not the case.

      Thank you for this comment. To address the reviewer comment we compared the migration patterns of the different macrophage phenotypes following intravitreal injection in mice exposed to photic-injury. Our results indicated that M0, M1 and M2c macrophages, similarly to M2a macrophages, migrated to the subretinal space and around the optic nerve. Thus, the neurotoxic effect of M2a is not explained by their capacity to infiltrate the retinal tissues. These results was included in Figure 1- figure supplement 2 E-H of the revised manuscript. These results are supported by our ex-vivo experiments, showing that co-culture of M2a macrophages with a retinal explants was associated with increased photoreceptor cells death compared to M1 macrophages. The results are presented and discussed in the revised manuscript in lines 200-203.

      5) Comment: Figure 1 - figure supplement 2: Panel A, B cells were stained with CD206 to demonstrate the presence of M2a macrophages (panel B). The authors conclude that panel A contains M1 and panel B contains M2a cells. The lack of CD206 expression illustrates that panel A cells are not M2a macrophages but do not demonstrate they are M1 macrophages. A control using an M1 cell marker is necessary to show that panel A cells are M1 and M1 cells are not detected in M2a cultures.

      Thank you for this comment. We have validated the phenotype of each macrophages subtype by qPCR (Figure 1 panel A). To further address the reviewer comment, we have performed additional immunocytochemistry for M1 macrophages using anti-CD80 antibody which is utilized as M1 macrophages marker (Bertani FR et al.2017). Results of the staining confirmed the identity of the M1 macrophages. These new results were included in Figure 1- figure supplement 2A, and are discussed in lines 168-170.

      6) Comment: Ex vivo, apoptotic photoreceptor and RPE cells are observed when cultured with M2a macrophages (Figure 2). Do injected M2a cells also induce apoptosis of RPE cells in vivo? This is important to establish that retinal explants are a good model for in vivo experiments.

      Thank you for this comment. To address the reviewer comment, we assessed RPE apoptosis (using TUNEL, Caspase 3 staining and RPE65 marker) after M2A cells delivery, in the in-vivo photic injury model. We could not detect apoptotic signal in the RPE layers 7 days after photic injury and therefore could not evaluate the effect of M2a macrophages on the RPE cells in-vivo (see Author response image 1). One possible explanation is that RPE cells that have undergone apoptosis are rapidly removed from the damaged tissue and are no longer detectable unlike photoreceptors. Furthermore, a study that investigated the impact of bright light on RPE cells in-vivo, showed that although RPE cells undergone structural and chemical modifications after photic-injury, TUNEL signal was not detected because RPE cell die by necrosis mechanism and not apoptosis (Jaadane I et al, 2017). Other studies validated that blue light induces RPE necrosis (Song W et al, 2022; Mohamed A et al, 2022). Taken together, it seems that ex-vivo retinal explant and in-vivo photic injury both simulate the mechanism of retinal cell death. However, the use of ex-vivo model allows for establishing the direct impact of M2a macrophages on retina in non-inflammatory context.

      Author responnse image 1.

      7) Comment: Reactive oxygen species (ROS) production was measured to determine if M2a cell-mediated neurotoxicity was due to oxidative stress. It is concluded that a ROS increase is partly responsible (Line 218). The data do not support this conclusion. ROS was detected in cultured M2a macrophages. More importantly, however, there was no increase in oxidative damage in vivo. The in vivo and cell culture results contradict each other so no conclusion can be made. The lack of in vivo confirmation weakens the argument that ROS drives M2a neurotoxicity. Text suggesting a role for ROS in neurotoxicity should be appropriately edited (Lines including 218, 244, 401,406,481).

      Thank you for this comment. The manuscript was revised according to the reviewer suggestion (Lines 250-256).

      8) Comment: The authors ask if the photoreceptor cell death is cytokine-mediated. Multiple cytokines were enriched in M2a-conditioned media. Of particular interest were CCR1 ligands MPIF1 and MCP4. The implication is that these two ligands mediate the M2a macrophages to photoreceptor cell death through CCR1. However, there is no attempt to show that either MPIF1 or MCP4 are present in vivo, or are sufficient to induce the retinal response observed. This could be demonstrated by injection of MPIF1 or MCP4. Evidence that either ligand phenocopies M2a macrophage injection would be direct evidence that CCR1 ligands activate the retinal response. Furthermore, co-injection with BX174 should block the effect of these ligands if they work through CCR1.

      Thank you for this comment. The identification of CCR1 ligands expression from M2a polarized macrophages directed our decision to study CCR1 in the context of atrophic AMD. We do not claim that these specific CCR1 ligands are sufficient to activate CCR1 and exert retinal injury. The mechanism is likely more complex. Yet, to address the reviewer comment, we have performed the experiments suggested by the reviewer. Mice were exposed to photic injury and immediately injected in one eye with MPIF1, MCP-4, or a combination of both and in second eye with PBS as vehicle. Intravitreal cytokines delivery was repeated two days later (following the half-life time of these cytokines) and ERG were recorded two days after the last injection. Injection of cytokines at a concentration of 300 ng per eye did not exacerbated photoreceptor death. Then, the same experiment was repeated with two higher concentrations of cytokine, 1.2 ug/eye and 2 ug/eye, but no changes are observed between the cytokines treated-eyes and the vehicle treated-eyes. Based on previous studies reporting the physiological concentration of different cytokines in eyes of un/healthy individuals and on experiments in which different cytokines are injected in rodent eye (Estevao C et al, 2021. Zeng Y et al, 2019; Roybal CN et al, 2018; Mugisho OO et al, 2018), the cytokine concentrations used in our experiment are in the range in which effect on the retina is expected.

      It is likely that a synergistic effect of M2a-secreted proteins in a particular microenvironment is necessary to increase the level of retinal damage (Bartee E et al, 2013). It is also likely that in the photic retinal injury model there is upregulation of cytokines that may mask additional delivery of exogenous cytokines. Comprehensive understanding of the complex interactions of these cytokines during retinal degeneration is beyond the scope of the current manuscript which is not focus on identifying ligand-induced CCR1 activation and its consequences. Additionally, we suggest that due to cytokine redundancy (Nicola NA; 1994), demonstrating that MPIF-4 or MCP-3 can increase photoreceptor death is not required for proving CCR1 receptor involvement.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary of the major findings -

      1) The authors used saturation mutagenesis and directed evolution to mutate the highly conserved fusion loop (98 DRGWGNGCGLFGK 110) of the Envelope (E) glycoprotein of Dengue virus (DENV). They created 2 libraries with parallel mutations at amino acids 101, 103, 105-107, and 101-105 respectively. The in vitro transcribed RNA from the two plasmid libraries was electroporated separately into Vero and C6/36 cells and passaged thrice in each of these cells. They successfully recovered a variant N103S/G106L from Library 1 in C6/36 cells, which represented 95% of the sequence population and contained another mutation in E outside the fusion loop (T171A). Library 2 was unsuccessful in either cell type.

      2) The fusion loop mutant virus called D2-FL (N103S/G106L) was created through reverse genetics. Another variant called D2-FLM was also created, which in addition to the fusion loop mutations, also contains a previously published, evolved, and optimized prM-furin cleavage sequence that results in a mature version of the virus (with lower prM content). Both D2-FL and D2-FLM viruses grew comparably to wild type virus in mosquito (C6/36) cells but their infectious titers were 2-2.5 log lower than wild type virus when grown in mammalian (Vero) cells. These viruses were not compromised in thermostability, and the mechanism for attenuation in Vero cells remains unknown.

      4) Next, the authors probed the neutralization of these viruses using a panel of monoclonal antibodies (mAbs) against fusion loop and domain I, II and III of E protein, and against prM protein. As intended, neutralization by fusion loop mAbs was reduced or impaired for both D2-FL and D2-FLM, compared to wild type DENV2. D2-FLM virus was equivalent to wild type with respect to neutralization by domain I, II, and III antibodies tested (except domain II-C10 mAb) suggesting an intact global antigenic landscape of the mutant virion. As expected, D2-FLM was also resistant to neutralization by prM mAbs (D2-FL was not tested in this batch of experiments).

      5) Finally, the authors evaluated neutralization in the context of polyclonal serum from convalescent humans (n=6) and experimentally infected non-human primates (n=9) at different time points (27 total samples). Homotypic sera (DENV2) neutralized D2-FL, D2-FLM, and wild type DENV similarly, suggesting that the contribution of fusion loop and prM epitopes is insignificant in a serotype-specific neutralization response. However, heterotypic sera (DENV4) neutralized D2-FL and D2-FLM less potently than wild type DENV2, especially at later time points, demonstrating the contribution of fusion loop- and prM-specific antibodies to heterotypic neutralization.

      Impact of the study-

      1) The engineered D2-FL and D2-FLM viruses are valuable reagents to probe antibodies targeting the fusion loop and prM in the overall polyclonal response to DENV.

      2) Though more work is needed, these viruses can facilitate the design of a new generation of DENV vaccine that does not elicit fusion loop- and prM-specific antibodies, which are often poorly neutralizing and lead to antibody-dependent enhancement effect (ADE).

      3) This work can be extended to other members of the flavivirus family.

      4) A broader impact of their work is a reminder that conserved amino acids may not always be critical for function and therefore should not be immediately dismissed in substitution/mutagenesis/protein design efforts.

      Evaluating this study in the context of prior literature -

      The authors write "Although the extreme conservation and critical role in entry have led to it being traditionally considered impossible to change the fusion loop, we successfully tested the hypothesis that massively parallel directed evolution could produce viable DENV fusion-loop mutants that were still capable of fusion and entry, while altering the antigenic footprint."

      ".....Previously, a single study on WNV successfully generated a viable virus with a single mutation at the fusion loop, although it severely attenuated neurovirulence. Otherwise, it has not been generated in DENV or other mosquito-borne flaviviruses"

      The above claims are a bit overstated. In the context of other flaviviruses:

      • A previous study applied a similar saturation mutagenesis approach to the full length E protein of Zika virus and found that while the conserved fusion loop was mutationally constrained, some mutations, including at amino acid residue 106 were tolerated (PMID 31511387).

      • The Japanese encephalitis virus (JEV) SA14-14-2 live vaccine strain contains a L107F mutation in the fusion loop (in addition to other changes elsewhere in the genome) relative to the parental JEV SA14 strain (PMID: 25855730).

      • For tickborne encephalitis virus (TBEV-DENV4 chimera), H104G/L107F double mutant has been described (PMID: 8331735)

      There have also been previous examples of functionally tolerated mutations within the DENV fusion loop:

      • Goncalvez et al., isolated an escape variant of DENV 2 using chimpanzee Fab 1A5, with a mutation in the fusion loop G106V (PMID: 15542644). G106 is also mutated in D2-FL clone (N103S/G106L) described in the current study.

      • In the context of single-round infectious DENV, mutation at site 102 within the fusion loop has been shown to retain infectivity (PMID 31820734).

      We thank the reviewer for these comments. We have adjusted the text above to better reflect and credit the prior literature. Text is modified as follows in the discussion session.

      “Previous reported mutations in the fusion loop are mainly derived from experimental evolution using FL-Ab to select for escape mutant or by deep mutational scanning (DMS) of the Env protein for Ab epitope mapping. Mutations in the FL epitope were observed in a DENV2-NGC-V2 (G106V)39, attenuated JEV vaccine strain SA14-14-2 (L107F)40, attenuated WNV-NY99 (L107F)41. While most of the mutations, including the double mutations reported here lead to attenuation of the virus. A recent DMS study showed that Zika-G106A has no observable impact on viral fitness42. Interestingly, we also recovered a mutation G106L, suggesting position 106 and 107 might be the most tolerable position for mutation in mosquito borne flavivirus FL. On the other hand, tick borne flavivirus as well as vector only flavivirus show a more diverse FL composition. The inflexibility of mosquito borne flavivirus might be due to the evolution constraint of the virus to switch between mosquito and vertebrate hosts.”

      Appraisal of the results -

      The data largely support the conclusions, but some improvements and extensions can benefit the work.

      1) Line 92-93: "This major variant comprised ~95% of the population, while the next most populous variant comprised only 0.25% (Figure 1C)".

      What is the sequence of the next most abundant variant?

      The sequence of the next most abundant variant has been added to the text.

      2) Lines 94-95: "Residues W101, C105, and L107 were preserved in our final sequence, supporting the structural importance of these residues." L107F is viable in other flaviviruses.

      We acknowledge that the L107F mutation has been described in other flaviviruses, including the tick-borne flaviviruses DTV and POWV. This mutation in JEV is associated with viral attenuation. This sentence is referring to the fact that, in our libraries, we did not recover variants with mutations at these positions, in contrast to D2-FL with variants at N103 and G106, indicating less mutational tolerance. However, we want to re-direct the focus of this manuscript to engineer a viable DENV that is antigenically different in the FL epitope, but not which residue is more tolerance for mutation.

      3) Figure 2c: The FLM sample in the western blot shows hardly any E protein, making E/prM quantitation unreliable.

      The samples used in Figure 2C derive from the growth curve endpoint (Figure 2A), in which there is a 1-log difference in viral titer between D2 and D2-FLM. Equivalent volumes of viral supernatant were loaded in the gel, explaining the reduced intensity of the E band in D2-FLM. The higher exposure on the right shows the E band more clearly for D2-FLM. The Western blot assay comparing prM/E ratio as a measure of maturation state was described and validated in our previous study (Tse et al. 2022, mbio). The methods and figure legend have been updated to include greater detail. The polyclonal E antibody was specifically chosen for this study as our previously used monoclonal antibody targeted the fusion loop. The polyclonal antibody was raised against a fragment of E (AA 1-495) and should have minimal effect by the fusion loop mutations.

      4) Lines 149 -151: "Importantly, D2-FL and D2-FLM were resistant to antibodies targeting the fusion loop. While neutralization by 1M7 is reduced by ~2-logs, no neutralization was observed for 1N5, 1L6, and 4G2 for either variant (Figure 3 A)".

      a) Partial neutralization was observed for 1N5, for D2-FL.

      The text has been updated to more accurately describe the 1N5 neutralization data.

      b) Do these mAbs cover the full spectrum of fusion loop antibodies identified thus far in the field?

      We did not test every known fusion loop antibody that has been described, instead focusing on 1M7, 1N5, 1L6, and 4G2, which were previously described by Smith et al and Crill et al. We also modified the text in discussion to reflect the possibility of other FL-Ab that are not affected by out mutations.

      “We have tested a panel of FL-Ab; however, we cannot exclude the possibility that other FL-Abs may not be affected by N103S and G106L. However, we have shown that saturation mutagenesis could generate mutants with multiple amino acid changes, and we are currently using D2-FLM as backbone to iteratively evolve additional mutations in FL to further deviate the FL antigenic epitope.”

      c) Are the epitopes known for these mAbs? It would be useful to discuss how the epitope of 1M7 differs from the other mAbs? What are the critical residues?

      Critical residues for these antibodies have been described. They are as follows: 1M7: W101R, W101C, G111R; 1N5: W101R, L107P, L107R, G111R; 1L6: G100A, W101A, F108A; 4G2: G104H, G106Q, L107K. The critical residues for 1M7 are slightly different than the others, perhaps explaining the residual binding to D2-FL. Note that the critical residue identified previously for 1M7 and 1N5 do not overlap with D2-FLM mutations, suggesting the FL mutations has extending effect on the antigenic FL epitope.

      d) Maybe the D2-FL mutant can be further evolved with selection pressure with fusion loop mAbs 1M7 +/-1N5 and/or other fusion loop mAbs.

      We agree that it may be possible to further evolve D2-FL using antibody selection, although we have not yet performed these experiments, we are currently performing iterative saturation mutagenesis and directed evolution to further evolve away from the natural FL.

      5) It would have been useful to include D2-M for comparison (with evolved furin cleavage sequence but no fusion loop mutations).

      Neutralization data for some of the mAbs against D2-M can be found in our previous study (Tse et al. 2022 mBio), in which no difference in neutralization was observed compared to DV2 wildtype. Given the limited resources of the anti-DENV NHP and human serum, we did not add D2-M for comparison. Although some insight can be deduced from the D2-FL vs D2-FLM comparison, we agree future studies that are designed to delineate CR-Ab population between prM, FL and other CR-epitopes should include D2-M for comparison.

      6) Data for polyclonal serum can be better discussed. Table 1 is not discussed much in the text. For the R1160-90dpi-DENV4 sample, D2-FL and D2-FLM are neutralized better than wild type DENV2? The authors' interpretation in lines 181-182 is inconsistent with the data presented in Figure 3C, which suggests that over time, there is INCREASED (not waning) dependence on FL- and prM-specific antibodies for heterotypic neutralization.

      We remade Table 1 to show dilution factors instead of dilution factor-1 of FRNT50.

      In general, our human convalescent sera from heterotypic infection (DENV1, 3 and 4) showed none to low neutralization against our DENV2. FRNT50s were between 1: 40 – 1:200. Given the weak potency of the antiserum, it is difficult to compare the FRNT50s between DV2-WT and D2-FLM.

      Similarly, in a different NHP cohort (2nd NHP cohort shown in Table 1), only one DENV4 infected NHP (R1160) showed a low heterotypic titer against DENV2. The detectable FRNT50s were between 1: 50 – 1:90. The value was extrapolated based on a single data point (1:40) which has above 50% neutralization. Given the Hill slope of all the neutralization curves were below 0.5, the FRNT50 values is should not be

      In conclusion, we do not think serum from Table 1 is potent enough to shows difference between the viruses. The intension to show the negative data in Table 1 is to highlight the difference in serum heterogeneity in DENV infected patients and experimental infected NHPs.

      As the reviewer pointed out, the dependence of FL-Ab in later time points increased (the difference between DV2 and D2-FL at 20dpi vs 60dpi vs 90dpi), suggesting non-FL CR-Ab is waning but not prM- and FL-Abs. We rewrote the sentence as follow:

      “These data suggest that after a single infection, many of the CR Ab responses target prM and the FL and the reliance on these Abs for heterotypic neutralization increase overtime (Figure 3C).”

      Suggestions for further experiments-

      1) It would be interesting to see the phenotype of single mutants N103S and G106L, relative to double mutant N103S/G106L (D2-FL).

      2) The fusion capability of these viruses can be gauged using liposome fusion assay under different pH conditions and different lipids.

      3) Correlative antibody binding vs neutralization data would be useful.

      We thank the reviewer for the suggestions; we agree these would be of interest and, indeed, these studies are currently underway. In regard to single mutants, these were present in the initial plasmid library but did not enrich after viral production and passage. Two possible explanations can be drawn, 1) The stochastic of directed evolution prevents a single mutant with similar fitness to enriched. 2) The two mutations are compensatory to each other to make a functional mutant. The 2nd hypothesis highlights the difference between saturation mutagenesis (this study) and DMS (in previous studies).

      Fusion capability is indeed very interesting, however, the mechanistic difference or not between wildtype FL and the mutated FL in supporting fusion is not the focus of this study. Instead, we are currently working on adapting the D2-FLM in mammalian cells. If successful, the difference in fusion mechanism between the Vero adapted and D2-FLM in different lipid, insect vs mammalian would be of interest.

      We are currently developing whole virus ELISA; we avoid using rE monomer for the study as it might neglect the conformation Ab.

      Reviewer #2 (Public Review):

      Antibody-dependent enhancement (ADE) of Dengue is largely driven by cross-reactive antibodies that target the DENV fusion loop or pre-membrane protein. Screening polyclonal sera for antibodies that bind to these cross-reactive epitopes could increase the successful implementation of a safe DENV vaccine that does not lead to ADE. However, there are few reliable tools to rapidly assess the polyclonal sera for epitope targets and ADE potential. Here the authors develop a live viral tool to rapidly screen polyclonal sera for binding to fusion loop and pre-membrane epitopes. The authors performed a deep mutational scan for viable viruses with mutations in the fusion loop (FL). The authors identified two mutations functionally tolerable in insect C6/36 cells, but lead to defective replication in mammalian Vero cells. These mutant viruses, D2-FL and D2-FLM, were tested for epitope presentation with a panel of monoclonal antibodies and polyclonal sera. The D2-FL and D2-FLM viruses were not neutralized by FL-specific monoclonal antibodies demonstrating that the FL epitope has been ablated. However, neutralization data with polyclonal sera is contradictory to the claim that cross-reactive antibody responses targeting the pre-membrane and the FL epitopes wane over time.

      Overall, the central conclusion that the engineered viruses can predict epitopes targeted by antibodies is supported by the data and the D2-FL and D2-FLM viruses represent a valuable tool to the DENV research community.

      Reviewer #1 (Recommendations For The Authors):

      1) Line 51-52: "Currently, there is a single approved DENV vaccine, Dengvaxia." Line 56-57: "Other DENV vaccines have been tested or are currently undergoing clinical trial, but thus far none have been approved for use."

      It should be specified for the global audience that this applies to the United States. Takeda's DENV vaccine, QDENGA is approved in Indonesia, European Union, and Brazil.

      The text has been modified to include this information.

      2) Line 62-63: - "The core fusion loop-motif DRGWGNGCGLFGK is highly conserved..." Lines 78-80: - We generated two different saturation mutagenesis libraries, each with 5 randomized amino acids: DRGXGXGXXXFGK (Library 1) and 79 DRGXXXXXGLFGK (Library 2).

      It may be useful for the readers if the amino acid numbers are stated. The core fusion loop motif DRGWGNGCGLFGK (Eaa98-110) is highly conserved. We generated two different saturation mutagenesis libraries, each with 5 randomized amino acids: DRGXGXGXXXFGK (Library 1; Xaa 101,103, 105-7) and DRGXXXXXGLFGK (Library 2; Xaa 101-105).

      This information has been added to the text.

      3) Line 91-92: "Bulk Sanger sequencing revealed an additional Env-91 T171A mutation outside of the fusion-loop region."

      It looks like the mutation T171A is in domain I of the E protein and does not seem to interface with the fusion loop. Is that why it wasn't pursued further?

      The E171A mutation was included in the infectious clone for D2-FL and D2-FLM. The text has been modified to clarify this inclusion.

      4) Lines 82-85: "Saturation mutagenesis plasmid libraries were used to produce viral libraries in either C6/36 (Aedes albopictus mosquito) or Vero 81 (African green monkey) cells and passaged three times in their respective cell types."

      a) What was the size of the libraries? How does one make sure that the experimental library actually has all the amino acid combinations that were intended?

      Each library has 5 randomized amino acids, so there are 205 = 3.2 million combinations. In these experiments, sequencing of the plasmid libraries revealed about 2 million unique amino acid sequences, or approximately 62.5% library coverage. The actual plasmid diversity is expected to be higher than 2 million as our deep sequencing has limited coverage.

      b) The wild type sequence was excluded from the libraries, correct?

      The wild-type sequence was not specifically excluded from the libraries, as there is no easy method to do so. Wild-type sequence was detected in the plasmid libraries but was not selected in the C6/36 library. However, in the Vero library, we recovered WT virus.

      5) Table 1: - Please include in the table description, what the colors indicate.

      We remade Table 1 to show dilution factors instead of dilution factor-1 of FRNT50 and removed the unnecessary color code. We also added all relevant information in the table legend.

      6) Lines 246-248: "Previously, a single study on WNV successfully generated a viable virus with a single mutation at the fusion loop, although it severely attenuated neurovirulence."

      It may be worthwhile to mention the WNV mutation (L107F) as some readers may be curious about where this mutation is relative to the ones described in this study.

      This information has been added to the text. We also included the previously described FL mutations in flaviviruses in the text.

      Reviewer #2 (Recommendations For The Authors):

      Major Critique:

      • There is a disconnect between Fig 2A and 2C. FL and FLM viruses have much lower levels of prM-E expression in the viral supernatants based on the western blot in 2C. Why isn't E being detected in the Western? Is the particle-to-pfu ratio skewed in the mutant viruses? Is it possible that the polyclonal is targeting the cross-reactive prM and FL epitopes, and if so would using a monoclonal antibody targeting a known DIII-epitope (2D22) yield a different western result? Also, the legend and methods for Fig 2C are not clear. What is actually being tested in the Western blot? Were equivalent volumes of the different viral preps used?

      The samples used in Figure 2C derive from the growth curve endpoint (Figure 2A), in which there is a 1-log difference in viral titer between D2 and D2-FLM. Equivalent volumes of viral supernatant were loaded in the gel, explaining the reduced intensity of the E band in D2-FLM. The higher exposure on the right shows the E band more clearly for D2-FLM. The Western blot assay comparing prM/E ratio as a measure of maturation state was described and validated in our previous study (Tse et al. 2022, mBio) and the methods have been updated to include greater detail. The polyclonal E antibody was specifically chosen for this study as our previously used monoclonal antibody targeted the fusion loop. The polyclonal antibody was raised against a fragment of E (AA 1-495) and should not be affected by the fusion loop mutations. 2D22 is a conformational antibody and does not work in western blot.

      • Table 1: The data within Table 1 is ignored in the text, and some of this data contradicts the central conclusions of the manuscript.

      o A.) Some of the convalescent data contradicts the hypothesis. DS0275 had an equivalent neut between DV2 and D2-FLM, DS1660, and R1160 (90) had better neut against the D2-FLM than DV2. Discussion of these samples is warranted.

      o C.) The description in the legend does not adequately describe the table. What do the colors represent? What are the numerical values being displayed? What is in parentheses, (I assume the challenge strain)? The limit of detection is reported as 1:40; 0.25. 1:40 is 0.025 which matches most of the data? There is inadequate description of these experiments in the materials and methods.

      We remade Table 1 to show dilution factors instead of dilution factor-1 of FRNT50 and removed the unnecessary color code. We also added discussion for Table 1 and clarify the difference between the three cohorts of serum in the text with the corresponding references.

      In general, our human convalescent sera from heterotypic infection (DENV1, 3 and 4) showed none to low neutralization against our DENV2. FRNT50s were between 1: 40 – 1:200. Given the weak potency of the antiserum, it is difficult to compare the FRNT50s between DV2-WT and D2-FLM.

      Similarly, in a different NHP cohort (2nd NHP cohort shown in Table 1), only one DENV4 infected NHP (R1160) showed a low heterotypic titer against DENV2. The detectable FRNT50s were between 1: 50 – 1:90. The value was extrapolated based on a single data point (1:40) which was above 50% neutralization. Given the Hill slope of all the neutralization curves were below 0.5, the FRNT50 values are not reliable.

      In conclusion, we do not think sera from Table 1 is potent enough to show difference between the viruses. The intension to show the negative data in Table 1 is to highlight the difference in serum heterogeneity in DENV infected patients and experimental infected NHPs.

      Minor critique:

      Figure 1C: Legend is not clear for this panel. What is on the x-axis of the bubble plots? Are these mutations across the entire viral genome or is this just the prM-E sequence?

      The X-axis is a scatter of all of the sequences contained in the library, similar to graphs used for plotting CRISPR screen results. These represent individual sequences from the saturation mutagenesis libraries in the fusion loop of E as described in Figure 1B.

      The wording in Lines 92-94 is not clear. It looks like the T171A mutation was present in 95% of the sequences (Line 92). Yet this sequence was not incorporated into the variant virus. What is the rationale for omitting this mutation in downstream variant virus generation?

      The 95% in Line 92 refers to the variant containing N103S/G106L mutations as seen in Figure 1C. The high-throughput sequencing approach did not include residue 171, so the presence of the T171A mutation in combination with fusion loop mutations cannot be determined. However, the E171A mutation was included in the infectious clone for D2-FL and D2-FLM. The text has been modified to clarify this inclusion.

      The authors discuss the potential of the D2-FL or D2-FLM virus as a potential vaccine platform in the abstract, introduction, and conclusion. This is a good idea, but the authors provide no evidence of feasibility in this manuscript.

      The ultimate goal to engineer a viable DENV with distinct FL antigenic epitope is for it use as live attenuated vaccine. As this is the rationale for the study, we introduce the concept throughout the manuscript. The current study demonstrated the possibility to mutate a novel fusion loop motif in DENV and provided evidence to show the favorable antigenic properties of D2-FLM. We agree with the reviewer that definitive work in animal to show vaccine efficacy need to be done and are currently undergoing. To avoid misleading our audience, we tone down the emphasis of vaccine use in the text.

      Line 150-153: Figure 3A demonstrates that the FL-specific antibodies broadly do not neutralize the mutant viruses. However, the conclusions are overstated in the text. 1N5 neutralizes the D2-FL variant.

      The text has been updated to more accurately describe the 1N5 neutralization data.

      Lines 175-182: The authors make a lot of assumptions about the target of the polyclonal target without any evidence.

      These lines reference studies that showed greater enhancement by antibodies targeting the fusion loop and prM as compared to other cross-reacting antibodies. The assumption that both our manuscript and others have drawn was that Abs that are cross-reactive and weakly neutralizing are more prone for ADE. As discussed, other groups have attempted to mutate the FL from recombinant E protein to achieve similar goal to remove the fusion loop epitope to reduce ADE. We have re-written the sentence in the followings:

      “As FL and prM targeting Abs are the major species demonstrated to cause ADE in vitro, we and others hypothesized these Abs are responsible for ADE-driven negative outcomes after primary infection and vaccination,10–12,32 we propose that genetic ablation of the FL and prM epitopes in vaccine strains will minimize the production of these subclasses of Abs responsible for undesirable vaccine responses. Indeed, covalently locked E-dimers and E-dimers with FL mutations have been engineered as potential subunit vaccines that reduce the availability of the FL, thereby reducing the production of FL Abs.33–36”

    1. Author Response

      We thank all three reviewers for their detailed reviews, and generally agree with their feedback. To accompany the reviewed preprint of this manuscript, we wished to respond to comments from the reviewers so that they (and the public) will know what we are planning to incorporate in the revised manuscript we are currently preparing. If there are any comments on our plans in the meantime, please let us know.

      • Reviewer 1, on concerns regarding identification of ontogenetic stage and comparison of taxa from different ontogenetic stages: It is fair to say that enantiornithine ontogeny is still poorly understood, though we believe all current evidence points to each specimen used in this study to being adequately mature for comparison to the extant birds used in the study. Stages of skeletal fusion are the standard method of assessing enantiornithine ontogeny (Hu and O'Connor 2017), and our comparison of histological work (Atterholt, Poust et al. 2021) to skeletal stages in Table S4 suggests a transition from juvenile to subadult in stage 0 or 1 and from subadult to adult within stage 3. Thus, the specimens we quantitatively examine in this study, all at stages 2 or 3 (Figure S10), are advanced subadults or adults. It is well-known that many living animals considered “adults” would be considered subadults or even juveniles to a palaeontologist (Hone, Farke et al. 2016). So, even if some individuals in this study are not fully skeletally mature, they should have obtained the morphology which they would possess for most of their lives and thus the morphology which undergoes selective pressure. We will add this context to the “Bohaiornithid Ontogeny” section and thank the reviewer for seeking more detail for this point.

      • Reviewer 2, on need of a context figure: We have an artistic life reconstruction of a bohaiornithid in preparation, and can include that in the revised manuscript as a figure.

      • Reviewer 2, on raptor claw categories: We explain these categories in-depth in a previous work (Miller, Pittman et al. 2023). However, we will now add a short summary of that explanation to this work so that this manuscript will become self-contained in this regard. In short, the “large raptor” category includes extant birds with records of regularly taking prey which cannot be encircled with the pes, while birds in the “small raptor” have no such records. As Reviewer 2 points out this does often follow phylogenetic lines, but not always. E.g. most owls specialise in taking small prey, but the great horned owl Bubo virginianus regularly takes mammals and birds larger than its pes (Artuso, Houston et al. 2020); and conversely we can only find reports of the common black hawk Buteogallus anthracinus taking prey samll enough for the pes to encircle (Schnell 2020) despite other accipiters frequently taking large prey. In both cases these taxa plot in PCA nearer to other large or small raptors (respectively) than to their phylogenetic relatives.

      • Reviewer 3, on teeth vs beaks: We are not aware of any foods which are exclusive to toothed or beaked animals. There are some aspects of extant bird biology that may affect the way a certain diet may need to be adapted to which we do comment on, e.g. discussion of alternatives to the crop and ventriculus for processing plant matter in the Bohaiornithid Ecology and Evolution section. For functional studies, e.g. FEA, we have included the rhamphotheca in toothless models which serves the same role as teeth, to be a feeding surface. It should not matter, in theory, if the feeding surface is hard or soft as mechanical failure occurs in high stress/strain states regardless of the medium. If having teeth necessarily increases or decreses overall stress/strain relative to a beak (and from our work this does not appear to be the case), this would in turn necessarily limit dietary options. So, all models in our work should be directly comparable.

      As an additional note on this topic, we address tooth shape in bohaiornithids at the end of the Bohaiornithid Ecology and Evolution section. We specifically note that their tooth shape is likley controlled by phylogeny in the current version, though we will add a note in the upcoming version that the morphospace of bohaiorntihid teeth overlaps that of many other clades with purportedly diverse diets, which is consistent with a hypothesis of diverse diets within the clade.

      • Reviewer 3, on cranial kinesis: Our FE models should be unaffected by cranial kinesis, as these are two-dimensional and model the akinetic lower jaw only. Some mediolateral kinesis may be relevant in the mandible in the form of “wishboning” in different taxa, but its prevalence in extant birds is currently unknown. The preservation of enantiornithines (two-dimensionally and typically in lateral view) limits the ability to capture any mediolateral function regardless.

      Our models of mechanical advantage do not account for any cranial kinesis. This is a necessary simplifcation. The nature of cranial kinesis in extant birds, and the role that it plays in feeding, is poorly understood. Cranial kinesis will increase gape, but we don’t yet know how/if it affects jaw closing force and speed (moreover, given the variation in quadrate and hinge morphology present in extant birds, this is also something that is likely to be highly diverse). We have therefore modelled the extant birds’ jaw closing systems as having one, akinetic out lever (the jaw joint to the bite point), to match the situation in our fossil taxa. This is a common simplification that has been used previously with success (Corbin, Lowenberger et al. 2015, Olsen 2017). However, we acknowledge that this simplification may introduce some error. Unfortunately, until the mechanics of cranial kinesis – and the variation in the anatomy and performance of kinetic structures in extant birds – are better understood, we cannot determine exactly what that error looks like. We therefore have greater confidence in the inter-species comparability this conservative, akinetic approach (in other words, we may not be making assumptions that are 100% accurate, but we are at least making the same assumption across all taxa, so it should be comparable in its error). We will add a section in the Mechanical Advantage and Functional Indices discussion calling for further research into the mechanics of cranial kinesis so future mechanical advantage work in birds can take this matter into account.

      • Reviewer 3, on skull reconstruction: This issue is partly addressed in the Bohaiornithid Skull Reconstruction section, though we agree that adding more mentions of it in the MA and FEA Discussion sections and the Bohaiornithid Ecology and Evolution sections will benefit the manuscript. Most notably Shenqiornis and Sulcavis have similar ecological interpretations, but much of the Shenqiornis skull reconstruction uses Sulcavis bones. Longusunguis is the only other taxon which takes more than two bones from a different taxon, and in this case all but the quadrate are not used in any quanitative measurements. We have ensured that the skull reconstructions presented in Figure 2 show what portions of the skull come from what specimen so that as new material is discovered and phylogenetic relationships are updated it will be clear to future readers which parts of reconstructions will need to be updated.

      • Reviewer 3, on data availability: All data including FEA models and raw measurement data are included in the same repository as the scripts, which we will make clear in the manuscript. Good catch on the data link being dead, we will publish it now.

      As a final note, it was brought to our attention by another colleague that the original manuscript’s ancestral state reconstrction lacked an outgroup. An updated reconstruction using Sapeornis as an outgroup will be included in the revised manuscript. The addition of the outgroup does not change any conclusions of the manuscript.

      We once again thank our reviewers for their valuable feedback and will submit a revised version of this manuscript for publication shortly. Please let us know if you have any additional comments after reading our response that we can take onboard in our revision.

      References

      Artuso, C., C. S. Houston, D. G. Smith and C. Rohner (2020). Great Horned Owl (Bubo virginianus), version 1.0. Birds of the World. A. F. Poole. Ithaca, NY, USA, Cornell Lab of Ornithology.

      Atterholt, J., A. W. Poust, G. M. Erickson and J. K. O'Connor (2021). "Intraskeletal osteohistovariability reveals complex growth strategies in a Late Cretaceous enantiornithine." Frontiers in Earth Science 9: 640220.

      Corbin, C. E., L. K. Lowenberger and B. L. Gray (2015). "Linkage and trade‐off in trophic morphology and behavioural performance of birds." Functional ecology 29(6): 808-815.

      Hone, D. W. E., A. A. Farke and M. J. Wedel (2016). "Ontogeny and the fossil record: what, if anything, is an adult dinosaur?" Biology letters 12(2): 20150947.

      Hu, H. and J. K. O'Connor (2017). "First species of Enantiornithes from Sihedang elucidates skeletal development in Early Cretaceous enantiornithines." Journal of Systematic Palaeontology 15(11): 909-926.

      Miller, C. V., M. Pittman, X. Wang, X. Zheng and J. A. Bright (2023). "Quantitative investigation of Mesozoic toothed birds (Pengornithidae) diet reveals earliest evidence of macrocarnivory in birds." iScience 26(3): 106211.

      Olsen, A. M. (2017). "Feeding ecology is the primary driver of beak shape diversification in waterfowl." Functional Ecology 31(10): 1985-1995.

      Schnell, J. H. (2020). Common Black Hawk (Buteogallus anthracinus), version 1.0. Birds of the World. A. F. Poole and F. B. Gill. Ithaca, NY, USA, Cornell Lab of Ornithology.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper addresses the question of Prdm9-dependent hotspots and Prdm9 alleles evolution. Two properties underlie this question: the erosion of hotspots by biased gene conversion and the high mutation rate of the Prdm9 zinc finger domain. Here the authors include an additional recently observed property of Prdm9: its role in DSB repair, by enhancing DSB repair efficiency when binding on both homologs (symmetric sites). The status of symmetric binding depends on Prdm9 level and affinity, possibly other factors. The authors present a model for simulating Prdm9 and hotspots co-evolution based on several assumptions (Number of DSB independent of Prdm9, two types of hotspots, strong or weak; hotspots compete; at least one symmetric DSB is required on the smallest autosome). Although the in vivo context is obviously more complex, these assumptions are reasonable (except for the number of Prdm9 bound sites) as they qualitatively recapitulate or get close to what is known about the requirement for fertility. The model leads to several important conclusions and predictions that Prdm9 limits the number of sites used since such conditions are predicted to allow for a weaker contribution of asymmetric sites.

      The presentation of the model is clear, but the results are difficult to follow and require many readings to follow the text and the associated figures.

      We edited the results section to make the progression of the argument clearer (as detailed below).

      A few specific points also require clarification:

      Competition: It seems that in the context defined Prdm9 is limiting (since most Prdm9 can be bound to all weak sites); in addition, it is not clear how the competition for DSB activity between Prdm9 sites is taken into account.

      We now clarify throughout the text that we have assumed conditions under which PRDM9 is limiting (as detailed below). We state in the Model that we assume “all PRDM9 bound sites are equally likely to experience a DSB”.

      The number of Prdm9-bound sites in vivo is not known, thus several values must be tested.

      We have run additional simulations (when considering strong and weak hotspots, k_1=5 or 50, and when considering large and small population sizes, N= 10^3 or 10^6), using P_T = 500, 1000 and 2500. The results of these simulations are included and discussed in Appendix 4.

      It would be interesting to discuss the model prediction in the context of several observations published on hybrids with variable Prdm9 gene dosage.

      We now include a section in the Discussion, entitled “PRDM9-mediated hybrid sterility”, which discusses the reported gene dosage effects in mice.

      Reviewer #2 (Public Review):

      In mammalian genomes (with some exceptions), the location of recombination hotspots is driven by the PRDM9 zinc-finger protein that recognizes some specific DNA motifs and recruits the machinery inducing double-strand breaks (DSBs) initiating recombination. As DSBs are repaired with the homologous chromosome, "hot motifs" can be rapidly eroded through gene conversion occurring during the repair. This led to the "hotspot paradox" question and to the development of red queen models of hotspot evolution where the lack of enough DSB motifs can select for new PRDM9 alleles recognizing new sets of motifs, which in turn are eroded. However, this model fails to explain some observations, in particular, that the number of DSB seems not limited by PRDM9 sites. Recent findings also showed that PRDM9 played a central role in the symmetrical binding of homologous chromosomes.

      In this study, the author incorporated this new finding (and more realistic assumptions compared to previous models) in a model of hotspot evolution. Their main result is that it affects the evolution dynamics and in particular the causes of selection on new PRDM9 alleles. Instead of selection pressure to increase the number of DSB targets, they showed that selection likely occurred instead to limit the number of hotspots to the hottest and symmetrical ones. These results are important as they changed our view and understanding of the evolution of mammalian hotspots and should have general implications for the study of recombination. The article focuses on complex mechanisms and can appear rather specific and technical. However, it nicely exemplifies the importance of taking molecular mechanisms into account to model genome evolution.

      Overall, the model is sound with no apparent flaw and should be an important contribution to the field. The model is rather complex but the authors focused on a few key parameters while fixing others based on empirical knowledge. This allows for highlighting the novelty of the results without being lost within too many scenarios and hypotheses. However, two main issues should be addressed but they mostly concern the way the model and the results are presented and do not. First, partly due to the complexity of the mechanisms, the core of the manuscript is rather difficult to follow and would deserve a more careful and explicit presentation to guide the reader, as detailed below. Second, the implications of the model and the practical and testable predictions it makes could be developed more, in particular, to compare with previous models. The main comments are listed below.

      1) The introduction reads very well and clearly explains complex mechanisms. It is a bit long and could be reduced a bit.

      Following this suggestion, we have reduced the length of the Introduction.

      2) It is quite helpful to analyze the model step by step. However, the objective of each step is not clearly explained, and it is left to the reader to understand where the authors want to go. At first read, it is not clear whether the authors present an analysis of the model or simulation results and why they do that. So, the results part deserves rewriting and re-organization to guide the reader.

      • In the two first parts (Fitness with one heat and two heats) it should be stated more explicitly that it corresponds to an analysis of the fitness landscapes generated by the molecular mechanisms than results on the evolutionary dynamics

      • The part "Dynamics of the two-heat model" corresponds to simulations and it is only at this point that mutation on PRDM9 is introduced.

      • In the present form, the presentation of the results describes many mechanisms (which is fine). However, as the model is complex, stressing the main conclusion for each part could be useful as then making a clear link between the different steps of the reasoning.

      We have rewritten the results sections to include more signposting and to make clearer the intentions behind each step taken.

      3) The choice of key parameters is well justified with a detailed review of the literature and it is well justified to fix most of them to focus on the key unknown (or not well-known) ones. However, in a few cases, additional simulations or at least better justification would be welcome, in particular on the mutation dynamics of PRDM9.

      Thank you for your suggestion. We have now added an additional appendix (Appendix 5), which investigates the dynamics of our model when newly arising PRDM9 alleles are initiated with hotspot numbers set near values that would be reasonable for perfect matches to motifs with 10 or 11 non-degenerate bases. We show that this sometimes affects the dynamics (compared to the case in the main text), but when it does, the differences can be readily understood using the same kind of reasoning developed in the main text.

      4) The model clearly gives new insights into the evolution of recombination hotspots and appears better to explain some results. However, it is not clear what are the predictions of the model that could be properly tested with data, in particular against previous models. Some predictions are proposed but remain mainly qualitative. For example, can one quantify that this model predicts a skewer distribution of hotspots compared to previous red-queen models? How good is the model at predicting the number of PRDM9 alleles in human and mouse for example? Only the diversity at PRDM9 is given, it may be interesting to also give the number of alleles to compare to observations. The discussion on this remains a bit vague. Finally, are there additional predictions of the model that could be used to test it?

      In previous Red Queen models, the specific distribution of heats was not important: fitness was determined by the sum of the heats of all available binding sites. Accordingly, these models do not predict a specific distribution, only that PRDM9 alleles that bind more overall would be favored. Our model thus provides the first theoretical framework under which there is an explicit benefit to localizing PRDM9 to smaller numbers of loci, a premise consistent with the use of hotspots, i.e., the use of only a small proportion of the genome for recombination.

      We chose the two-heat model as a reasonable first approximation to the true distribution. If we were to consider a more realistic binding distribution (or similarly, if we relaxed our assumption about most PRDM9 molecules being bound), the quantitative conclusions would likely be affected. Accordingly, while our simplified model provides robust insights into the dynamics of PRDM9 evolution, quantities such as the predicted levels of diversity in our model may be off and cannot be readily compared to what is observed in human and mice populations. We now better clarify the scope of our results and what may be done to extend it, in the Discussion.

      5) The Penrose stair metaphor is appealing but it seems to be dependent on the definition of hotspot, so not to represent a real biological process. Related to metaphors, it is also not very clear whether the authors suggest abandoning the red-queen metaphor for the benefit of the Penrose stair one. Actually, we can still consider that it is a red-queen dynamics but with a different underlying driver.

      We have expanded our discussion of the difference between these two analogies in the discussion section “Does the decay of hotspots by GC lead to more or fewer hotspots?” to clarify that the Penrose stairs model is a specific kind of Red Queen model. However, precisely because a hotspot has a somewhat arbitrary definition, we can imagine her running in either direction–towards fewer or more hotspots– depending on our perspective on the Penrose stairs.

    1. Author Response

      Reviewer #2 (Public Review):

      Please note that I am not a structural biologist and cannot critically evaluate the details of figures 1 to 3; my review focuses on the cell biology experiments in figures 4 and 5.

      Paine and colleagues investigated structural requirements for the interaction between the ESCRT-III subunit IST1 and the protease CAPN7. This is a continuation of previous work by the same group (Wenzel et al., eLife 2022), which showed that Capn7 is recruited to the midbody by Ist1 and that Capn7 promotes both normal abscission and NoCut abscission checkpoint function. In this article, the structural determinants of the Ist1-Capn7 interaction are characterised in more detail, focusing on the structure of Capn7 MIT domains and their binding to Ist1. Notably, point mutations in Capn7 MIT domains known to mediate binding to Ist1 and midbody recruitment are shown here to be required for abscission functions, as expected from the authors' previous paper. Furthermore, the report shows that a Capn7 point mutant lacking proteolytic activity behaves as a loss-of-function in abscission assays, despite showing normal midbody localisation. These are important results that will help in future studies to understand how the Capn7 protease regulates abscission mechanistically.

      The report is clearly written and the results support the main conclusions. Some technical limitations and alternative interpretations of the data should be discussed in the text, as outlined below.

      1) It is not always clearly stated how the results presented in this report relate to those in the Wenzel paper. For example, the finding that Ist1 recruits Capn7 to midbodies (p. 6 and figure 4) was first shown in the Wenzel paper. The novelty here is not that Capn7 MIT mutants fail to localise to midbodies, but that they phenocopy the previously described knockdown of Capn7, failing to support normal abscission and NoCut function (fig. 5). This supports and extends the findings of Wenzel et al. It is important to make this explicit and explain the conceptual advances shown here more clearly.

      We take the reviewer’s point and we have now clarified this issue in the text (e.g., page 7, lines 4-5).

      2) The NoCut checkpoint can be triggered by chromatin bridges, DNA replication stress, and nuclear basket defects, but only basket defects are tested here. Therefore, it is not clear if NoCut is still functional in Capn7-defective cells after replication stress and/or with chromatin bridges. Ideally, this should be tested experimentally, or alternatively discussed in the text, especially since the molecular details of how NoCut is engaged under different conditions remain unclear. For example, "abscission checkpoint bodies" proposed to control abscission timing form in response to nuclear basket defects and aphidicolin treatment, but not in the presence of chromatin bridges (Strohacker et al., eLife 2021).

      We appreciate the reviewer’s excellent suggestion. We have now performed the requested experiments and added a new figure showing that CAPN7 is also required to maintain the NoCut checkpoint when it is triggered by DNA bridges (new Figure 6A) or by replication stress (new Figure 6B).

      3) The current data suggest that Capn7 is a regulator of abscission timing, but in my opinion do not quite establish this, for two main reasons. First, abscission timing is not directly measured in this study. Time-lapse imaging would be required to rule out alternative interpretations of the data in figure 5. For example, a delay in an earlier cell cycle stage could in principle lead to a decrease in the overall fraction of midbody-stage cells. Second, the absence of the midbody is not necessarily a marker of complete abscission. Indeed, midbody disassembly is associated with the completion of abscission in unchallenged HeLa cells, but not in cells with chromatin bridges (Steigemann et al, Cell 2009). Midbodies remain a useful marker for pre-abscission cells, but the absence of midbodies should not be immediately interpreted as completion of abscission without further assays. Formally, a direct measurement of abscission timing would require imaging of the plasma membrane, for example using time-lapse phase-contrast microscopy (Fremont et al., 2016 Nat Comm). These limitations should be mentioned in the text.

      We note that midbody numbers are not our only measure of abscission delay/failure - we also measure the numbers of multinucleate cells and sum the two. Nevertheless, we understand the reviewer’s point and have therefore noted that we are using increased frequencies of cells with midbody connections and multiple nuclei as surrogate markers for abscission defects and NoCut-induced abscission delays (page 7, lines 13-14 and line 17).

      4) IST1 plays a role in nuclear envelope sealing by recruiting the co-factor Spastin (Vietri et al., Nature 2015), a known IST1 co-factor also confirmed in the previous interactome screen (Wenzel et al. 2022). CAPN7 could have a role in maintaining nuclear integrity upon the KD of Nup153 and Nup50 (Mackay et al. 2010) instead of/in addition to its proposed role in delaying abscission as part of the NoCut checkpoint at the midbody. I don't think the authors can differentiate between these two possibilities, and it would be interesting to consider their possible implications on how the "NoCut" checkpoint is triggered.

      The reviewer again makes good points, and we agree that in addition to participating in abscission, CAPN7 may be involved in closure of the nuclear envelope and that nuclear envelope closure may, in turn, be linked to satisfaction of the NoCut checkpoint. This involvement would nicely explain our observations that both SPAST and CAPN7 participate in both NoCut and abscission. We are in an unusual situation, however, because other colleagues in our field have told us in private communications that they observe that CAPN7 does, in fact, participate in nuclear envelope closure. We find that observation interesting and exciting but it is their discovery, not ours, and we have therefore refrained from doing analogous experiments ourselves. As a compromise, we have added the following text to the penultimate section of our paper (page 8, lines 34-35 through page 9, lines 1-11):

      “Our discovery that both CAPN7 and SPAST participate in the competing processes of cytokinetic abscission and NoCut delay of abscission may appear counterintuitive, but we envision that the MIT proteins could participate in both processes if they change substrate specificities or activities when participating in NoCut vs. abscission; for example, via different sites of action, post-translational modifications, and/or binding partners. We note that, in addition to its well documented function in clearing spindle microtubules to allow efficient abscission (Yang et al., 2008), SPAST is also required for ESCRT-dependent closure of the nuclear envelope (NE) (Vietri et al., 2015). The relationship between NE closure and NoCut signaling is not yet well understood, and it is therefore conceivable that nuclear membrane integrity is required to allow mitotic errors to sustain NoCut signaling. It will therefore be of interest to determine whether or not CAPN7, in addition to its midbody abscission functions, also participates in nuclear envelope closure and, if so, whether that activity is connected to its NoCut functions.”

      We think that this additional text explains what we (and the reviewer) consider to be an attractive model, but leaves open the question of CAPN7 involvement in nuclear envelope closure to be resolved by our colleagues.

      5) Figure 5 should include images of representative cells, highlighting midbody-positive and multinucleated cells. Without images, it is not possible to evaluate the quality of these data.

      We appreciate this suggestion and have now added images showing midbody-positive and multinucleated cells from the quantified datasets to allow assessment of our data quality (new Figures 5B and 5D).

    1. Author Response

      Reviewer #1 (Public Review):

      Iskusnykh et al. present an elegant and thorough analysis of the role of transcription factor Lmx1a as a master regulator of the cortical hem, which is a secondary organizer in the brain. The authors report that loss of Lmx1a in the hem alters expression levels of Wnts, that Lmx1a is critical for hem progenitors to exit the cell cycle properly, and that Lmx1a loss leads to defects in CR cell differentiation and migration. Furthermore, the authors show that hem-like fate can be induced by overexpressing Lmx1a. This is a fundamental role for a transcription factor that was long used as a hem marker but was never examined for its function in the hem. This study has broader implications for how secondary organizers are created in the embryo and would be of great interest to a wide readership. The conclusions are broadly well supported by the data, though there are a few points of interpretation that need to be addressed.

      We appreciate the positive comments and insightful suggestions of Reviewer 1. Please see our response to specific comments below. New text in the revised paper is blue (see our marked up copy of the paper, submitted as related manuscript file). Please note that since we reformatted the paper (re-submitted figures separately rather than embedded them into the text), line numbers changed relative to the original submission.

      (1) Figure 3A shows staining intensity in WT and Lmx1a-/- whereas the quantification has Lmx1a+/-. Both genotypes are relevant, -/- and +/-, to test whether the loss of 1 copy of Lmx1a results in a partial diminution of Wnt3a levels. Likewise, it is necessary to examine Wnt3a expression levels in the Wnt3a+/- embryo. Together, these could explain why the Lmx1a+/-; Wnt3a+/- double heterozygote has a DG phenotype, otherwise, it remains an unexplained though interesting observation.

      In the original paper, the label in the Wnt3a quantification panel (Fig. 3C) contained a typographical error. The label should read “Lmx1a-/-“, not Lmx1a+/-. (Originally, we did not analyze Lmx1a expression in Lmx1a+/- embryos; we analyzed only wt and Lmx1a-/- embryos.) We apologize for this error and corrected the label typo in the revised manuscript (Fig. 3C).

      Based on the above comment, in the revised manuscript, we analyzed the expression of Wnt3a in Lmx1a and Wnt3a single and double heterozygotes, in addition to wt and Lmx1a-/- embryos. To address a comment of Reviewer 2 about a “limited robustness of quantification of in situ hybridization signal”, we isolated CH by LCM and analyzed Lmx1a expression by qRT-PCR (Fig. 3D, E). Interestingly, we found that loss of one copy of either Wnt3a or Lmx1a does not significantly downregulate Wnt3a expression, but loss of one copy of Lmx1a on the Wnt3a+/- background (Lmx1a+/-;Wnt3a+/- mice) reduces Wnt3a expression, providing additional evidence that Lmx1a regulates expression of Wnt3a and explaining the appearance of the DG phenotype only in the double (but not single-gene) heterozygotes. These data are now described in the Results section (page 12, lines 255-260 and Fig. 3D, E). All of our Wnt3a expression data are now properly presented.

      (2) Line 309: "to test Wnt3a as a downstream mediator of Lmx1a function in CH/DG development, we performed an analysis of Lmx1a/Wnt3a double heterozygotes rather than Wnt3a overexpression rescue experiments in Lmx1a -/- mice." The authors' reasoning is unclear. The double het experiments do not go on to show that one gene acts via the other. It's entirely possible the two act via parallel pathways. However, since Lmx1a does indeed regulate Wnt3a levels, this is a good argument for suggesting it acts via Wnt3a, even without the overexpression rescue. The authors could reorganize the data and rephrase the definitive "acts via" statement (also in the heading of this section, line 289, and discussion, line 553) to better fit the data.

      Thank you for this comment. We reorganized/improved our reasoning as requested. Now we state that we performed an analysis of Lmx1a/Wnt3a double heterozygotes to test “whether Lmx1a and Wnt3a co-regulate hippocampal development” (rather than to test Wnt3a as a downstream mediator of Lmx1a function, as it was stated before) (page 12, lines 271-272). As correctly suggested by the Reviewer, we now conclude that “Although these double heterozygote experiments alone do not necessarily show that one gene acts via the other, as two genes may act via parallel pathways, reduced expression of Wnt3a in Lmx1a-/- embryos and downregulation of Wnt3a expression in Lmx1a+/-;Wnt3a+/- embryos relative to Wnt3a+/- embryos show that Lmx1a acts upstream of Wnt3a, thus, suggesting that Lmx1a promotes DG development, at least partially, by modulating expression of Wnt3a.” (page 13, lines 277-282).

      We rephrased the definitive "acts via" statement throughout the text and in the heading of this section. Now we use more balanced phrases. The heading now reads: “Lmx1a regulates expression of Wnt3a to promote DG development.” (Page 11, line 241), while in the Discussion we state that Lmx1a regulates Wnt signaling to promote hippocampal development (page 21, lines 467-468).

      (3) In the discussion section, the authors should include that trans-hilar and supragranular scaffold is disrupted in Lrp6 and Lef1 single as well as double mutants, which indicates Wnt signaling has a role to play in the morphogenesis of this scaffold. In this context, the author may discuss how Lmx1a could regulate this process via modulating Wnt signaling.

      Now in the Discussion we state: “It has also been previously shown that single and double mutants for Lrp6 and Lef1 genes, which encode components of the Wnt signaling transduction pathway, exhibit disrupted transhilar and supragranular scaffolds (Zhou et al., 2004; Li and Pleasure, 2005), indicating that Wnt signaling has a role in the development of the hippocampal glial scaffold” (Page 20, lines 445-449). Then, we conclude “Our gene expression studies and phenotypic analysis of Lmx1a-/- mutant and Lmx1a+/-;Wnt3a+/- double heterozygous mice identified Lmx1a as a novel regulator of proliferation of DG progenitors, hippocampal glial scaffold formation and electrophysiological properties (input resistance) of DG neurons, which likely, at least partially, promotes hippocampal development by modulating Wnt signaling, particularly expression of its secreted ligand Wnt3a. ” (Page 20, lines 449-454).

      (4) Reduction in Tbr2 levels (Fig4B): E13.5, not all Tbr2+ cells in the hem show a visible decrease in Tbr2 levels. The CR cells in the marginal zone show faint Tbr2. It would be useful if the staining intensity within the hem was quantified by dividing the section into three bins along the radial axis: Ventricular Zone, "Intermediate" zone, and Marginal zone to get a sense of the intensity profile. Co-labeling with p73 would identify CR cells and distinguish them from hem progenitors.

      We co-labeled wt cortical hem with Tbr2 and p73 immunohistochemistry and found that virtually all Tbr2+ cells in the marginal layer (where CR cells accumulate before initiating their tangential migration toward the hippocampal fissure) are p73-positive, while most Tbr2+ cells in the ventricular and intermediate bins are p73-negative (presumably not fully differentiated progenitors) (Figure 4 – figure supplement 2). These data provide further rationale for quantifying Tbr2+ progenitors separately in three different bins, as recommended by the Reviewer, which we now report in Figure 4B, C. This analysis revealed that loss of Lmx1a reduces Tbr2 expression across the three bins in the CH, but most significantly (p<0.001) in the Marginal zone.

      These data are now described in the Results section, page 14, lines 308-317.

      (5) Are the total number of Prox1+ cells at E14.5 similar between control and Lmx1a-/- ? Might the decrease in Prox1+ cells in the DG of P21 Lmx1a-/- animals occur due to granule cell death or because fewer cells were specified due to lower Wnts from the compromised Lmx1a-/- hem? The authors should examine cell death, labeling with CC3 and Prox1 together to test the cell death angle and discuss if the specification angle applies.

      Our new cell counts revealed a reduced number of Prox1+ cells in the DNe of e14.5 Lmx1a-/- mutants (Fig. 1K-M). We also show that proliferation in e14.5 DNe is reduced in Lmx1a mutants (Fig. 1N-Q), which is expected to contribute to the reduced number of Prox1 cells. Since proliferation is diminished in Lmx1a mutants, it is very hard to definitively demonstrate whether (in addition to proliferation) a reduced specification of DG progenitors contributes to the lower number of Prox1+ cells found in the DNe (and later in DG) of Lmx1a mutant mice. However, since Wnt3a is known to both induce DG progenitors and promote their proliferation, it is likely that a reduced specification also contributes to the reduced number of Prox1 cells in Lmx1a -/- mutants. Now we discuss this possibility in the Discussion by stating: “Wnt3a, which is downregulated in the Lmx1a-/- CH, is known to promote not only proliferation but also the specification of DG progenitors (Lee et al., 2000; Mangale et al., 2008; Subramanian and Tole, 2009b). Thus, although not directly tested in the current study, it is likely that the reduced number of Prox1+ DG progenitors in Lmx1a-/- embryos results not only from their reduced proliferation but also because of their decreased specification.” (page 22, lines 497-501).

      To study whether increased apoptosis contributes to the reduced number of Lmx1a-/- DG cells, we performed a very detailed analysis of apoptosis with an activated Caspase 3 immunohistochemistry at multiple stages (at e14.5 in the DNe, before DG cells exit the DNe; at e16 and e18.5 in the hippocampal primordium, and at e18.5, P3 and P21 in the DG (when the DG is formed), using Prox1/activated Caspase 3 co-immunostaining). No difference in apoptosis was found at any stage between wt and Lmx1a-/- embryos, indicating that misregulated apoptosis is not a major contributor to the DG phenotype of Lmx1a-/- mutants (Fig. 1R-T; Fig. 1- figure supplement 3).

      (6) In figure 6, the authors show that Lmx1a OE is sufficient to induce hem-like features, and identify p73+ cells (CR cell lineage). Is the choroid lineage not induced or was it not examined? A line to this effect would be useful. Also, the validation that it is indeed ectopic hem could be stronger with a few additional markers, since this is a striking finding.

      In the original paper, induction of the choroid plexus lineage was not investigated. Now we add two additional markers: Ccdc3 (a marker of CH) and Ttr (a marker of choroid plexus). Lmx1a in utero electroporation into medial telencephalic neuroepithelium induced ectopic expression of Ccdc3 (Fig. 6 – figure supplement 1A-D’) but did not induce expression of Ttr (Fig. 6 – figure supplement 1E-F’), strengthening the conclusion that Lmx1a specifically induces CH features in the medial telencephalon. These data are now described in the Results section, page 17, lines 372-373, 377-379, and 387-389.

      Reviewer #2 (Public Review):

      The cortical hem is one of the main signaling centers in the vertebrate forebrain, regulating neurogenesis of the medial pallium and the generation of Cajal-Retzius neurons. The authors examine how this signaling center is formed and functions. Previously, transcription factors playing instructive roles in the development of the cortical hem have been identified, but a master regulator had not been found so far. The authors build on their previous work studying the transcription factor Lmx1a which is one of the earliest and most specific cortical hem markers.

      By combining loss- and gain-of-function studies, RNA sequencing, histology, and analysis of downstream factors, the authors rigorously show Lmx1a is required for the expression of signaling molecules in the hem, the proliferation and functionality of dentate gyrus neurons, the cell cycle exit and differentiation (and also migration) of cajal-retzius cells and this by activating different downstream regulators.

      They use golden standard experiments in the field such as BrdU-Ki67 cell-cycle exit measurements, RNA sequencing, and patch clamping; combined with state-of-the-art techniques such as RNAscope and laser capture microdissection. These convincingly show that Lmx1a regulates the proliferation of dentate gyrus progenitor cells and a malformation of the transhilar scaffold.

      We appreciate the positive comments and insightful suggestions of Reviewer 2. Please see our response to specific comments below (see our marked up copy of the paper, submitted as related manuscript file). New text in the revised paper is blue. Please note that since we reformatted the paper (re-submitted figures separately rather than embedded them into the text), line numbers changed relative to the original submission. The authors also claim a migration deficit for dentate gyrus progenitors, but they do not consider apoptosis or show direct evidence for migration abnormalities.

      Now we provide additional in vivo data to support migration abnormalities from the DNe (Fig. 1 – supplement 2) and modified the Discussion related to migratory defects from the DNe as recommended by the Editors. Also, by performing a very detailed analysis of apoptosis, we provide strong evidence that apoptosis is not altered in Lmx1a-/- mutants at multiple stages (Fig. 1 – supplement 3). These results are described in detail below, in our response to the first specific comment of Reviewer 2.

      In the hem, the authors report normal proliferation and apoptosis in the Lmx1a mutants, but aberrant cell-cycle-exit, from which the authors conclude a problem in differentiation. However, this could be a cell cycle progression problem too (stuck in a certain cell cycle phase?), as the RNAseq data suggest. The authors should acknowledge this possibility.

      The possibility of a cell cycle progression problem in Lmx1a -/- CH is now acknowledged in the Discussion. Specifically, we state: “Finally, in Lmx1a mutants, we linked a decreased number of CR cells with a reduced exit of CH progenitors from the cell cycle. However, our data do not exclude a possibility that loss of Lmx1a also causes a cell cycle progression defect (resulting in CH progenitors being delayed in a certain phase of the cell cycle). This hypothesis remains to be tested.” (page 22, lines 501-505).

      The RNAseq dataset provides candidate downstream regulators of the observed phenotypes and the authors test the functionality of Wnt3a, Tbr2, and Cdkn1a, showing they are involved in distinct processes.

      Strikingly, Wnt3a is not significantly downregulated in the RNAseq data in the Lmx1a mutant, but quantification of in situ hybridization signal (which is less robust) did reveal a significant difference. Is this a splice variant issue? A timing issue or specificity of the RNAscope probe? The authors should look into this more carefully.

      Our Wnt3a RNAscope in situ hybridization recapitulates known Wnt3a expression pattern (specific expression in the CH), indicating that this probe is specific. A splice variant issue is also unlikely because, according to the Genome Browser and the NCBI Gene Bank, only one Wnt3a splice variant exists in the mouse. It can be a timing issue (e13.5 for RNAseq versus e14 for RNascope analysis). But, please, note that in our RNAseq experiment, the FDR for Wnt3a downregulation was 0.13, which is close to significance.

      To further address the downregulation of Wnt3a expression in Lmx1a-/- CH, we performed additional experiments using a complementary technical approach. We isolated the CH from e14 wt and Lmx1a-/- mutants by laser capture microdissection (LCM) and analyzed Wnt3a expression by qRT-PCR with already published/validated primers for Wnt3a (Watanabe et al., 2016, Biol Open 5, 1834-1843). We focused on e14 because it is closer to e14.5 when we observed a reduced proliferation in the DNe in Lmx1a-/- embryos. Our new LCM/qRT-PCR analysis confirmed Wnt3a downregulation (Fig. 3D, E) that we initially observed in our in situ hybridization experiments (Fig. 3A-C), increasing our confidence that Lmx1a regulates Wnt3a expression in the CH.

      To study the role of Cdkn1a, the authors performed rescue experiments using in utero electroporation, which is a standard in the field. However, they argued before that "CR cell migration and DG morphogenesis are complex processes that require precise expression levels of key genes" when studying downstream factors Wnt3a and Tbr2. Why is this no longer an issue studying Cdkn1a?

      This is because, in Cdkn1a rescue experiments, we test a much simpler (binary) output: whether electroporated (GFP+ cells) are Ki67 positive (cycling progenitors) or Ki67 negative (exited the cell cycle). In contrast, Wnt3a or Tbr2-related experiments require the evaluation of either DG formation (the number of Prox1+ cells in the DG) or the location of CR cells in the HF, both of which are very complex outputs. (DG formation relies on the correct proliferation, glial scaffold formation, migration and differentiated events, while CR location involves long-range migration). Both DG morphogenesis and CR migration are highly sensitive to the expression level of their essential developmental genes (Zhou et al., 2004; Arredondo et al., 2020; Gil et al., 2014; Ha et al., 2020; Hevner, 2016 in the paper reference list). As in utero electroporation does not easily allow precise control of gene expression level, such an approach would likely produce higher levels of Wnt3a and Tbr2 in at least some cells of Lmx1a-/- embryos relative to endogenous levels of Wnt3a/Tbr2 in wild type mice. Higher than physiological levels of expression of these proteins may cause additional abnormalities, complicating the interpretation of results of Wnt3a and Tbr2 electroporation experiments aimed to rescue Lmx1a-/- hippocampal phenotypes.

      As mentioned above, because in the case of Cdkn1a, we test a much simpler output (the presence or absence of Ki67 expression), we do not expect Cdkn1a overexpression to complicate the interpretation of the results: some electroporated Lmx1a-/- cells could exit the cell cycle “too fast”, but it still does not complicate the interpretation of the Ki67 expression readout.

      We provide additional explanations for the Cdkn1a rescue experiment in the paper. We state: “To study whether decreased Cdkn1a expression mediates a reduced cell cycle exit of CH progenitors in Lmx1a-/- embryos (Fig. 2A-C), we used immunohistochemistry with antibodies specific for Ki67, which labels cycling progenitors. As the presence/absence of Ki67 expression is a simpler output than complex DG morphogenesis and long-range migration of CR cells, we performed Cdkn1a overexpression rescue studies using in utero electroporation of the CH at e11.” (Pages 15-16, lines 344-347).

      To study cell-cycle exit in this model, the authors quantified GFP and Ki67. Since electroporation not only targets the progenitor cells (see e.g. Govindan et al. 2018, Nature protocols), the authors should confirm these results with a BrdU/Ki67 quantification as in previous experiments, or confirm electroporation only targeted progenitor cells in their model.

      Now we experimentally demonstrated that electroporation targets progenitor cells in our model. Thus, we confirmed that our approach is appropriate for the analysis of progenitor differentiation in the CH.

      Specifically, we in utero electroporated a GFP expressing plasmid into the CH of e11 embryos and imaged the GFP signal 15 hrs later (to identify electroporated cells) together with Ki67 immunolabeling (to identify progenitors). We reasoned that 15 hrs would be sufficient to produce GFP protein from the plasmid but also short enough to avoid differentiation of progenitors that received the plasmid. We found that in both wt and Lmx1a-/- embryos, almost all GFP+ cells in the CH were Ki67+ (e.g., progenitors). There was no difference between wt and Lmx1a-/- embryos at this early time point (Fig 5 – supplement 1). (GFP+/Ki67- cells were extremely rare in both genotypes. These cells may be either differentiated cells that took the plasmid during electroporation or electroporated progenitors that exited the cell cycle during the 15-hr interval after electroporation.)

      In the Results section, we now state: “The ventricular layer of the CH that borders the lateral ventricles consists of progenitor cells, so it is expected that plasmids injected into the lateral ventricles and electroporated into the CH will target such progenitors. However, since electroporation can also target differentiated cells (Govindan et al. 2018), we first injected a GFP-encoding plasmid into the lateral ventricles, electroporated it in utero into the CH of e11 embryos and analyzed GFP+ cells after a short (15 hrs) time period. This analysis revealed that virtually all (~95%) GFP+ cells were Ki67+ (progenitors) in both wild type and Lmx1a-/-embryos (Fig. 5 – figure supplement 1), confirming that this system is appropriate to target progenitors.” (Page 16, lines 348-355).

      Lastly, the authors ectopically expressed Lmx1a and convincingly show its ability to generate a hem-like structure. Could the authors elaborate on the necessity for a medial signature? Can the hem be ectopically induced in the lateral pallium?

      To address this question, we electroporated Lmx1a into the lateral cortex and found that laterally, it could not induce a major cortical hem marker Wnt3a (Fig. 6 – supplement 2). Thus, a medial identity is required for Lmx1a to induce the cortical hem, the finding which is now presented in the Results section (page 17, lines 388-389).

      Also, in the Discussion, we elaborate on the necessity for a medial signature: “Interestingly, while Lmx1a induced CH features in the medial telencephalon, Lmx1a overexpression in the lateral cortex failed to induce ectopic expression of Wnt3a, indicating that medially expressed competence factors (permissive genes) are needed to maintain the CH-inducing activity of Lmx1a. Such factors are likely to include Gli3 and Dmrt3/4/5, loss of which compromises the development of the endogenous CH (Grove et al., 1998; Kikkawa and Osumi, 2021; Quinn et al., 2009; Subramanian et al., 2009a; Subramanian and Tole, 2009b) (page 19, lines 424-430).

    1. Author Response

      eLife assessment

      This important study deepens our understanding of macrophage phenotypes in pathological contexts and identifies a new macrophage state associated with tissue fibrosis, as well as putative drivers of this cellular state. The authors provide convincing evidence and performed a well-thought-out and thoroughly described computational analysis of single-cell RNA-sequencing data. This work will be of broad interest to the fields of tissue inflammation, fibrosis, macrophage biology, and immunology.

      We thank eLife reviewing editors as well as the two Reviewers for their supportive, constructive and insightful assessment of the manuscript. We apologize for the time that has taken us to submit the revisions. The main reason for this delay was the integration of newly published scRNA-seq datasets that were relevant for gaining further power and reproducibility for our analyses, especially for refining the transcriptomics resolution of SPP1+MAM- and SPP1+MAM+ cells and their respective correlation with ageing. Specifically, we have added new datasets from NASH [1] and endometrium [2] patients so that each human tissue comprises scRNA-seq data derived from at least 2 independent studies (revised Table 1). Crucially, as the human lung cell atlas got published recently (after receipt of our decision letter) [3], we investigated in greater detail (increased N numbers and co-variates), the association of SPP1+ macrophages and homeostatic ones with lung ageing.

      This new undertaking was not directly asked by reviewers/editors, but instead, was suggested as informal feedback received after posting our manuscript into biorxiv repository. Importantly, these revisions together with the corrections asked by the two reviewers made the conclusions of the manuscript stronger (and more robust as we increased the number of samples) by refining (i) the regulons that associate with SPP1+MAM+ differentiation and (ii) subset-specific association with human and mice lung ageing, a finding that suggests MAM polarization state is acquired when there is prominent tissue fibrosis. Lung aging is significantly associated with SPP1+MAM- state, which represents the inflammatory/secretory phenotype that yet to be polarized to the fibrotic one seen in the disease state.

      Reviewer #1 (Public Review):

      Huang, Kevin Y. et al. perform a meta-analysis of single-cell RNA-seq (scRNA-seq) data derived from 11 studies and across six tissues (liver, lung, heart, skin, kidney, endometrium) to address a focused hypothesis: pro-fibrotic SPP1+ macrophages that have been found in liver and lung tissue of idiopathic pulmonary fibrosis patients exist in other human tissues which can result in broader fibrotic disease states. The authors use existing, state-of-the-art single-cell analysis tools to perform the meta-analysis. They convincingly show that the SPP1+ macrophage population can be identified in lung, liver, heart, skin, uterus (endometrium), and kidney clusters derived from each tissues' scRNA-seq data. They further identify three subpopulations of the SPP1+ macrophages: a matrisome-associated macrophages (MAMs) defined as SPP1+MAM+ and two others enriched for inflammatory and ribosomal processes which they group together and define as SPP1+MAM-. Pathway analysis of genes unregulated in SPP1+MAM+ vs SPP1+MAM- cells yields significant enrichment of extracellular matrix remodeling and metabolism-related pathways and genes. This allows them to arrive at SPP1+MAM+ and SPP1+MAM- gene expression signature scores to further highlight the upregulation of these pathways in SPP1+MAM+ macrophages and their role in fibrosis. They explicitly show enrichment for SPP1+MAM+ macrophages in disease compared to healthy control subjects in a variety of tissues and their associated fibrosis-related diseases. Cell differentiation trajectory analysis identified 2 main trajectories: both starting from FCN1+ infiltrating monocytes/macrophages with one moving toward a homeostatic state and another toward SPP1+MAM+. They verified this using an alternative trajectory analysis approach. Importantly, for all tissues and fibrotic diseases, they found SPP1+MAM+ were at the end of the trajectory preceded by the SPP1+MAM- state, suggesting SPP1+MAM+ represents a common polarization state of SPP1+ macrophages. They develop a probability-based score that estimates the propensity of SPP1+MAM- macrophages to differentiate into SPP1+MAM+ and show that this was significantly higher in fibrotic disease subjects compared to healthy controls. They go on to identify the transcription factor networks (regulons) associated with SPP1+MAM+ differentiation and activation. They find a number of enriched regulons/transcription factors and through a linear-modeling trajectory analysis highlight the regulons that are associated specifically with the SPP1+MAM- to SPP1+MAM+ transition. In this way, they prioritize the NFATC1 and HIVEP3 regulations as driving the differentiation of SPP1+MAM- macrophages toward the SPP1+MAM+ polarization state. Finally, given that age is a risk factor for fibrotic disease, they assessed the association of SPP1+MAM+ and SPP1+MAM- gene signatures in healthy control old and young human subjects as well as old and young mice and found SPP1+MAM+ was either exclusively (human) or more significantly (mice) elevated in old versus young compared to SPP1+MAM-.

      The strengths of this paper are the authors gathered a number of relevant single-cell RNA-seq data sets from fibrosis-focused studies to address a highly focused hypothesis (stated above). They gained the power to detect the population of SPP1+MAM+ cells by integrating these datasets. The analysis is carried out well using existing state-of-the-art tools. With whatever metric or single cell analysis-based discovery they make about the SPP1+MAM+ subpopulations (e.g., gene signatures, endpoint of trajectory analysis, associated regulons, etc), they compare the relevant scoring metrics in fibrosis and control subjects at every stage of the meta-analysis and find the SPP11+MAM+ is consistently higher across tissues and fibrosis-related diseases.

      There are only minor weaknesses in this paper. One is that some of the most highly significant or simply significant results are not shown in main figures but are summarized in supplementary tables (e.g., MYC TARGETS V1 would have appeared as the most significant, highest enriched, and among the largest in terms of set size). Another is analysis criteria that may not yield the most biologically relevant or impactful conclusion (e.g., while the regulon THRA does not display a shift in slopes it shows the strongest, progressive increase going toward the SPP1+MAM+ state).

      We thank the Reviewer for his very accurate summary of our findings. We agree with the Reviewer regarding all points and provide the answers to the suggested minor points as per below.

      Reviewer #2 (Public Review):

      In the past few years, single-cell transcriptomics analysis has uncovered cellular states associated with disease in experimental models and humans, revealing previously unrecognized disease-associated macrophage states. In particular, a macrophage state characterized by high expression of SPP1 (encoding osteopontin), and by a specific gene expression signature including the expression of TREM2, has been observed in various pathologies and given various names depending on the context e.g. TREM2hi macrophages, lipid-associated macrophages (LAM), disease-associated microglia (DAM), Scar-associated macrophages (SAM), etc... However, a focused investigation and comparison of SPP1+ macrophages across disease contexts were lacking. Here, the authors aimed to systematically analyze SPP1+ macrophages in the context of tissue fibrosis, and integrated single-cell RNA-seq data of >200,000 human macrophages in 6 organs in health and tissue fibrosis.

      Beyond confirming the presence of SPP1+ macrophages with a conserved gene expression module (TREM2, CD9, GPNMB, etc...) across tissues and their association with fibrosis, the authors identified a previously unknown cell subset within SPP1+ macrophages, that was enriched for the expression of genes involved in remodeling of the extracellular matrix, which they termed SPP1+ matrisome-associated macrophages (SPP1+MAM+). The authors further used computational tools to compare these SPP1+MAM+ macrophages to previously described SPP1+ macrophage states (LAM, DAM, SAM), investigate the differentiation and activation trajectory of SPP1+MAM+ macrophages, and identify potential transcriptional regulators involved in their differentiation. Finally, the authors show that SPP1+MAM+ macrophages are associated with ageing in both humans and mice.

      Overall, the conclusions of the authors are well supported by the data. The authors made excellent use of available computational tools, and the figures are clear and informative. The methods are well-described and appropriately used. In particular, the authors made a nice effort in explaining and justifying some key decisions in their scRNA-seq data analysis workflow, including a data-driven approach to decisions in the clustering analysis.

      The author's findings are of broad interest to the fields of tissue inflammation, fibrosis, macrophage biology, and immunology, and their report constitutes a valuable resource, and a basis for further investigations of macrophage differentiation mechanisms in tissue fibrosis, and how macrophages could be targeted to alleviate pathological tissue fibrosis.

      We thank the reviewer for finding our work valuable and for carefully assessing the manuscript. We agree with the Reviewer regarding all points.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Salloum and colleagues examines the role of statin-mediated regulation of mitochondrial cholesterol as a determinant of epigenetic programming via JMJD3 in macrophages.

      Key strengths of the work include:

      1) Mechanistic analysis of how statin treatments can remodel the mitochondrial membrane content via cholesterol depletion which in turn affects JMJD3 levels is a novel concept.

      2) Use of RNA-seq and ATAC-seq data provides an avenue for unbiased analysis of the statin effects.

      3) Use of methyl-cyclodextrin (MCD) alongside statins increases the robustness of the findings and the use of NFKB inhibitors suggests a mechanistic role for NFKB.

      The conclusions are only partially supported by the presented data:

      1) There is a lack of any in vivo studies that are required to demonstrate that the concentrations of statins used to induce epigenetic programming of macrophages are physiologically relevant. There have been numerous studies that have examined the anti-inflammatory effects of statins but there is significant debate and controversy regarding the in vivo relevance. Much of the in vivo effects of statins are achieved via changes in systemic cholesterol levels but the direct effects on macrophages are not clear.

      More discussion on this issue has been added (P9, line 9-33)

      2) "Statins" is used globally and it is unclear which statins were used, which doses of statins, and the treatment durations.

      Names of the statins have been added for the individual experiments in the figure legends.

      3) The RNA-seq, ATAC-seq, and selected H3K27 ChIP only show a snapshot of the results without leveraging the power of unbiased analysis. Such an unbiased analysis could show whether the examined genes are indeed the most relevant targets of statins.

      (a). Data are now analyzed with unsupervised GSEA, i.e. on all differentially expressed genes, both up and down, to identify the most significantly altered pathways. TNFa signalling via NF-aB came out on top (Fig. 1 A), similar to our conclusion from previous analyses.

      4) CCCP depletion can have broad toxic effects and it is difficult to interpret specific roles of ATP synthase from potentially toxic mitochondrial uncoupling.

      CCCP within the dosages used in this study has no detectable toxicity. An MTT test was performed and added (Supplementary Fig. 5).

      Reviewer #3 (Public Review):

      The manuscript by Salloum et al., titled "Statin-mediated reduction in mitochondrial cholesterol primes an anti-inflammatory response in macrophages by upregulating JMJD3" reports an extensive characterization of the mechanisms underlying the anti-inflammatory role of statins using different in vitro studies. Based on these approaches, the authors observed that cholesterol reduction in response to statin treatment alters mitochondrial function and they identify JMJD3 as a potential critical driver of macrophage anti-inflammatory phenotype. Overall, the study is interesting and provides new findings that could shed light on the molecular effects of statins in these cells, but a number of issues remain confusing, and the experimental design is, on some occasions, not rigorous enough to support the drawn conclusions.

      Major issues:

      1) Focus on JMJD3 is justified by the authors as it was among the 40 genes commonly up-regulated in macrophages exposed to statin or methyl--cyclodextrin (MCD) by RNA-Seq analysis. However, this analysis has not been presented in the manuscript and it is unclear what genes (apart from JMJD3) might play an important role in the response of these cells. A detailed characterization of both up- and down-regulated genes in these experimental conditions and a better justification for JMJD3 are required to fully support further analysis.

      a. RNA-seq data from statin- and MCD-treated macrophages was re-analyzed by unsupervised Gene Set Enrichment Analysis (GSEA) (Fig. 1 A & B), which includes all differentially expressed genes, up and down, by cholesterol reduction. The conclusion is identical to the previous analysis, i.e. NF-kB is the top pathway activated by cholesterol reduction. The analysis in last version, which used a different program, is now moved to Supplementary Fig. 1.

      b. ATAC-seq data was similarly re-analyzed with GSEA (Fig. 6 A). Again, NF-kB is the top pathway activated by cholesterol reduction (Fig. 6 A, b). Examples of the lineups between ATAC-Seq peaks and RNA-seq peaks have been added (Fig. 6 B).

      c. RNA-seq data from LPS-stimulated macrophages with or without statins is also re-analyzed. Gene Ontology (GO) analysis of genes showing decreased expression upon statin treatment revealed that statins primarily suppress inflammatory processes (Fig. 7 A, b), while genes involved in cellular homeostatic functions were upregulated (Fig. 7 A, c).

      2) In the same line, Figures 6A and B fail to fully describe the changes found by ATAC-seq and RNA-seq. A more comprehensive analysis of these three datasets (together with previous RNA-seq studies) would help to obtain a better understanding of overlapping dysregulated genes (not only those found up-regulated) and what other epigenetic modifying factors might be involved.

      See response to reviewer #1, 3. Also response to reviewer #2, 3.

      3) In Figure 6C and Supplementary Figure 7, it would be noteworthy to also measure the gene expression of Kdm6a/UTX homolog Kdm6c/UTY, as it has been shown to lack demethylate H3K27me3 demethylase activity due to mutations in the catalytic site of the Jumomji-C-domain.

      Kdm6c/UTY in human is a male specific histone demethylase (PMID: 24798337). As statins are not known for sex-biases, this demethylase is not likely to play a role here.

      4) The use of rather unspecific treatments such as MG-132 (proteasome inhibitor) and GSKj4 (inhibitor of both JMJD3 and UTX) may distort the results observed and might elude their correct interpretation. To avoid this limitation, additional silencing and/or overexpression experiments are currently needed.

      Jmjd3 knockdown experiments have been added to complement the glutamine-free and GDKj4 experiments (Fig. 8, C).

      5) Figure 3 and Supplementary Figure 3 seem to be duplicated, please correct them. Moreover, for a better representation of these data, please include representative Seahorse profile figures of each experimental condition in these figures.

      Sorry for the error. It is corrected (Fig. 3, BMDMs).

      6) As stated by the authors, macrophage phenotype is much more complex than M1/M2 polarization. In this view, assessing a very limited set of genes (i.e, Il-1, IL-10, TNF, IL-6, IL-12, Arg1, Ym1, Mrc1) appears to be inappropriate. A meaningful number of markers must be added.

      Yes, this is complex, and it would good if we could assess more genes for this purpose. M1/M2 polarization is relatively poorly defined, in terms of genes expressed. We used a list of genes that most tested in literature. For example, Nat Immunol. 2017 Sep;18(9):985-994.

      7) For accurate quantification of H3K27me3 global levels, please add immunoblotting against histone H3 in Supplementary Figure 1. Will look for it. H3 and H327me3 could not do in the same plots. It would involve stripping, which we do not trust.

      No-stripping was the exact reason we didn’t use H3 as loading control. Comparison between separate plots could be another source of error. In addition, we would like to control for the effective cholesterol reduction in these cells by p-Creb. Whole cell lysates were used for western blotting, with actin as control for cell numbers.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Drs. Miura, Mori, and colleagues, first present lineage tracing data using PDGFRa-CreERT2 and Foxa2-Cre drivers to show that PDGFRa+ cells, when lineage-labeled early in development go on to form the lung mesenchyme (but little to none of the epithelium), whereas FOXA2 expressing cells go on to contribute to both the lung epithelium and lung mesenchyme. However, it is already well known that FOXA2 is expressed in the mesendoderm around the time of gastrulation, and that this population generates both endoderm and mesodermal derivatives. As a result, it is not surprising that lineage labeling this population would contribute to both the lung epithelium and lung mesenchyme. The authors use the term bona fide lung (BFL) generative lineage. However, since the mesendoderm contributes to both the endoderm and mesoderm, but is by no means specific to the lung, and as shown in this paper (Figure 2G) the FOXA2 population only generates 30-40% of the mesenchyme, the term BFL is both confusing and misleading.

      We deleted the BFL concept and the sentences from the entire manuscript.

      In the second portion of the manuscript, the authors conditionally delete Fgfr2 using a Foxa2-Cre driver. Although loss of Fgf10 or Fgfr2 is known to result in lung agenesis, deletion of Fgfr2 within the FOXA2+ expressing cells is novel. However, since FOXA2 is broadly expressed within the nascent lung epithelium and Fgfr2 is known to be expressed within the lung epithelium, it isn't entirely clear how much information this adds beyond what already known from other Fgfr2 knockout studies. Perhaps the most interesting aspect of the reported phenotype is that the other organs (e.g. intestine) in these knockout animals appears to be relatively spared. This should be better characterized by the authors, as currently only a few H&E images are shown.

      As the reviewer described, Foxa2 is broadly expressed in the epithelium of several organs. We analyzed the other organs of Foxa2Cre/+; Fgfr2cnull mice shown in new Figures 4 - figure supplement 1C and 2A outlined in the manuscript, lines 267-275. We found that the intestine and other major organs were tdTomato-labelled but intact. Significantly, we discovered that thymus agenesis phenotype in Foxa2Cre/+; Fgfr2cnull mice because of the Fgfr2 requirement for their development (Dooley et al., 2007).

      The authors then used conditional blastocyst complementation with nGFP+iPSCs from wild-type mice to rescue the phenotype of the Fgfr2 conditional knockout mice, showing that an embryonic lung is formed. However, blastocyst complementation has previously been performed with other knockout mouse models with severe lung hypoplasia/aplasia, including Dr. Mori's previous Nature Medicine paper. Although most of the previous mouse models target the endoderm/early epithelial cells (e.g. conditional deletion of Ctnnb1, Fgfr2, or global knockout of Nkx2.1; see Li E, et al. Dev Dyn 2021 Jul;250(7):1001-1020; Wen B, Am J Resp Crit Care Med. 2020; in addition to Mori M, Nature Medicine, 2019), Kitahara A, et al (Cell Rep. May 12 2020;31(6):107626) previously reported blastocyst complementation in in Fgf10 null mouse model, so it is not clear what the current study significantly adds contributes to this existing body of literature. The lungs of the mice undergoing blastocyst complementation are also incompletely characterized in the current version of this study. For example, it is unclear how functional these lungs are and whether they are capable of gas exchange after birth.

      Our new Foxa2-lineage-based CBC model mice showed novel evidence of the co-generation of lung and thymus. We also added evidence that those rescued mice of the Foxa2-lineage-based CBC model survived until adulthood with normal lung function. These new findings were included in Figure 5, and described in the manuscript, lines 318-344.

      Reviewer #2 (Public Review):

      For most organs including lung produced by blastocyst complementation, certain cells including the blood vessels are still derived from host tissues, making them unfit for transplantation. To address this issue, Miura et al. explored the origin and the program of whole lung epithelium and mesenchyme, and identified the crucial Foxa2 lineage for lung organogenesis by using lineage tracing mice and human iPSC derived lung differentiation. They found that Foxa2 lineage cells contribute to both lung epithelium and mesenchyme formation, which suggest targeting Fox2 lineage cells could create an empty developmental niche for blastocyst complementation in mice. They further deplete Fgfr2 gene in Foxa2 lineage cells to induce the lung agenesis phenotype in mice, and donor mouse iPSCs injected into Fgfr2 mutant blastocysts occupied the empty niche and formed the missing lung.

      Strengths:

      To fill our knowledge gap of the origin of all lung cell types, especially pulmonary mesenchyme and endothelium, the authors investigated the lineage hierarchy of specified lung precursors in gastrulating mesendoderm. Using mouse lineage trancing and human iPSC derived lung differentiation, they clarified the msendoderm gene Expression pattern and progression, and compared the contributions of Pdgfra and Foxa2 lineage cells during lung development. They further demonstrate that the defective Foxa2 lineage in critically important for efficient lung complementation, which provide insight for next generation lung transplant therapies.

      Weakness:

      1) Several lineage tracing experiment lack rigorous quantification, the authors using "partially labels" or "labels a part of" in the text to describe their finding and conclusion, which make the evidence less solid.

      As described above, we quantified the lineage tracing mice and added results in new Figures 1C and 1G.

      We quantified the lineage-tracing results by morphometric analyses described in Figures 1C and 1F. We provided the quantification of Foxa2 lineage tracing studies in early embryogenesis and removed the unqualified results from Figure 1, and the manuscript was corrected in lines 136-144 and 155-161.

      Regarding Figure 1C, we have tried to have more numbers of embryos for these analyses using PdgfraCreERT2; Rosa tdTomato/+ mice. However, we often encountered embryo miscarriage due to the effect of Tamoxifen, even with the titration of tamoxifen or using the co-injection of progesterone (Nikita et al., 2019). Through more than twenty times experimental trials of Tm injection, we finally obtained a total of four embryos, three at E12.5 and one at E14.5. Those results were added in the new Figures 1A and B. This data was outlined in the manuscript, lines 134-141.

      2) The ideal lung for transplant should be functional for gas exchange, the lung complementation was only analyzed at E17.5 and E14.5, these two stages were too early to determine the function of the lungs generated via CBC.

      We showed additional evidence of the rescued mice in adulthood. We confirmed that Foxa2Cre; Fgfr2cnull injected with donor PSCs survived until adulthood, and there are no differences in the respiratory function compared to Foxa2Cre; Fgfr2hetero injected with donor PSCs. We added this result in new Figure 5 and described it in the manuscript lines 318-344.

      3) Immune cells contribute large proportion in the lung, and are critical for lung transplant, the chimerism analysis of immune cells is missing in this study.

      We analyzed the chimerism of hematopoietic cells in the E17.5 experiment, but there were no differences among all chimeric mice (see Table 1 and Figure 4 - figure supplement 3D). We thought this was because the origin of hematopoietic cells is the Liver and Yolk Sac (Yokomizo et al., 2022), which are off-target for our CBC model. However, we found that the thymus was also complemented in this model, as we described above. Since the thymus is a specialized primary lymphoid organ responsible for the education of T cells, essential for the maturation of T cells, this complementation may help for future successful transplantation, which can avoid post-transplantation graft versus host disease (GvHD). This data and discussion were added in Figure 4 - figure supplement 3D and Table 1, and the manuscript lines 293-295, and 417-427.

    1. Author Response

      Reviewer #2 (Public Review):

      The work reports a minor modification in the protocol for Prp formation in vitro. Using this the authors evaluate the role of Syntaxin 6 in modulating prion formation in vitro and the toxicity of the amyloid fibrils in cell culture models. The authors show that while prions/amyloids formed by PrP are non-toxic, mixed aggregates formed by Stx6/PrP are toxic; they claim that this is due to the toxic aggregation intermediates that accumulate more in the presence of Stx6. However, the basis of enhanced toxicity of Stx6/PrP mixed aggregates is not clear and doesn't seem to be physiologically relevant; there is no evidence that Stx6 and PrP forms mixed aggregates in vivo. Which is the toxic component of the Stx6/Prp co-aggregate? Is it the Stx6 component or the Prp component? Additionally, the authors do not have mechanistic explanation for the effect of Stx6 on PrP prion formation

      We thank the reviewer for his assessment and we agree that more in vivo data was needed to support the physiological relevance of the effect of syntaxin-6 on PrP. We now provide two new key experiments demonstrating interaction of STX6 with PrP in a cell model of prion disease and testing the effect of Stx6 knockout on the replication of infectious RML prions in PMCA assays (Figures 4, 4S1, 4S2). Please refer to our response to reviewer 1, point 1 for more details. We respectfully disagree that the native aggregation assay represents a minor modification of PrP fibril formation protocols. While this statement may be true in the narrow technical sense, it is striking that in more than 25 years of prion research, no aggregation assay under near-native assay conditions had been developed. The conditions of previous assays, which relied on thermal or chemical denaturation to facilitate PrP misfolding, were inherently incapable of assessing the effect of protein modifiers of PrP fibril formation. Therefore, the NAA opens a wide field of new experiments to mechanistically probe modulators of PrP aggregation and toxicity under physiologically relevant conditions. The protein syntaxin-6 proves a test case for this new capability.

      The reviewer may have misunderstood the mechanistic hypothesis for neurotoxicicty that is supported by our data. We are not claiming that the co-aggregates between PrP and syntaxin-6 are toxic. As our data demonstrate, aggregation endpoints in the presence of STX6 have little neurotoxicity, as do fibrillar aggregation endpoints without the presence of STX6 (Figures 5 and 5S1). Rather, based on the well-established oligomer toxicity hypothesis, we are concluding that STX6, by delaying or preventing formation of mature amyloid fibrils, caused toxic aggregation intermediates to persist. Our new data from secondary seeding assays (Figure 5S2) demonstrate that at the aggregation time points when the maximum amounts of neurotoxic species are present (20 h), no seeding competent fibrils have yet been formed. The presence of STX6 prolongs this period and therefore increases toxicity (Figures 5 and 5S1). These data directly support the established theories for the basis of amyloid toxicity and, additionally, caution that an intervention to delay amyloid formation can have deleterious effects on toxicity. We have now made this point more clearly in our discussion. Of course, we, like many other protein misfolding laboratories in the world, are also working hard on isolating and characterizing the toxic species in prion and other protein misfolding diseases, which, as the reviewer suggests, will be a very important milestone in understanding these diseases.

      Reviewer #3 (Public Review):

      The autocatalytic replication mechanism of misfolded Prion-like proteins (PrP) into amyloid aggregates is associated with a plethora of deleterious neurodegenerative diseases. Despite of the huge amount of research, the underlying molecular events of self-replication and identification of the toxic species are not fully understood. Many recent studies have indicated that non-fibrillar oligomeric intermediates could be more neurotoxic compared to the Prion fibrils. Various cellular factors, like the participation of other proteins and chaperone activity, also play an important role in PrP misfolding, aggregation, and neurotoxicity. The present work focuses on understanding the PrP aggregation mechanism with the identification of the associated toxic species and cellular factors. One of the significant strengths of the work is performing the aggregation assay in near-native conditions. In contrast, most in vitro studies use harsh conditions (such as high temperature, denaturant, detergent, low pH, etc.) to promote protein aggregation. The authors successfully observed the well-known seeding property of the PrP in this aggregation assay that bypasses the primary nucleation during aggregation. Moreover, the authors have shown that syntaxin 6 (Stx6), a known risk factor in prion-mediated Creutzfeldt-Jakob disease, delays fibril formation and prolongs the persistence of toxic intermediates, thus playing an anti-chaperone activity. This study will contribute to understanding the molecular mechanism of PrP aggregation and neurotoxicity. However, further studies are required to identify and characterize the toxic intermediate in the near future precisely.

      We thank the reviewer for his thoughtful and accurate summary. We fully agree that the nature of the toxic species in protein misfolding diseases is a key challenge of the field and we hope that our study contributes to solving this puzzle.

    1. Author Response:

      We would like to express our gratitude to the reviewers for the time and effort dedicated to evaluating our manuscript. We are committed to addressing each of the comments and recommendations they have presented.

      It appears that a majority of the feedback emphasizes the need for clarity and expanded explanations. We acknowledge these points and are confident that offering a clearer exposition and delving into further details will notably enhance the manuscript. In our initial draft, our intention was to ensure accessibility to non-mathematical readers by minimizing technical jargon. However, the feedback underscores the importance of certain details, particularly for those well- versed in ODE modelling, and the need to provide complete information.

      While we find the reviewers' feedback invaluable, it is worth noting that none of the critiques suggest a fundamental change in our presented analyses. Below, we offer brief responses to the primary critiques mentioned in the public review:

      1) The first notable comment pertains to the selection criteria for parameter and initial condition values. This critique is indeed valid. In brief, parameter values were chosen from a range of 10^- 5 to 10^4, representing rates from 10 femtomolar/minute to 10 micromolar/minute, spanning a biologically plausible spectrum. It is conceivable that values outside this range exist but are exceedingly rare. Similarly, initial conditions were chosen within the range 10^0 to 10^4, typically represented in nM.

      2) The second comment highlights the challenges in systematically determining a full spectrum of parameter sets with 94 free parameters. In our observations, as we expanded the number of model instances, the distribution of protein dynamics exhibited minimal variation. A doubling of model instances from 100,000 to 200,000 led to less than a 1% error change. This error was calculated based on the differences across every protein species and dynamic category. These findings suggest that examining more than 100,000 model instances neither shifts the dynamic distributions significantly nor unveils new resistance mechanisms. We are committed to presenting these analyses more comprehensively in the revised manuscript.

      3) The query about the appropriateness of filtering our models based on computational feasibility is pertinent. Our contention is that this filter does not exclude a significant number of model instances. Furthermore, stiff ODEs generally arise from extremely high reaction rates, which are exceedingly rare in a biological context. Thus, their exclusion only filters out exceedingly rare biological contexts, and only a small proportion of the time.

      4 & 5) Clarifications sought about the simulations will be addressed. Though we feel the details were implicitly incorporated, we will make them explicit in the subsequent version.

      6) The final major comment underscores the qualitative nature of our validation, which we agree. Currently, we are exploring experimental techniques or datasets for a more robust validation. In our next revision, we will ensure a more in-depth discussion of the validation in the manuscript's discussion section.

      Once again, thank you for your valuable feedback. We look forward to submitting a revised version that addresses all concerns and enhances the manuscript's overall quality.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful and positive evaluation of our work. Below, we have addressed all of the essential revisions and provide point-by-point responses to all of the reviewer comments. Additionally, we include with this resubmission quantification microneme localization, determined by expansion microscopy, which further validates the central role of HOOK in microneme trafficking.

      Suggested revisions:

      1. Please confirm the interaction between CDPK1 and ROM4 by reciprocal IP.

      Prompted by the reviewers suggestions we examined more closely the pulldowns of WT and myristoylation-deficient CDPK1 (cMut). ROM4 had been identified as differentially enriched in the cMut pulldown; however, upon closer examination we realized that the abundance of ROM4 is actually even greater in the untagged control and therefore likely a variable contaminant in the pulldowns. We have re-analyzed the results of those pulldowns to focus on proteins significantly enriched in association with either WT or cMut CDPK1, relative to untagged controls. Among this set of 16 enriched proteins, only three proteins appeared differentially enriched between WT and cMut. None of the proteins associated with CDPK1 inform pathways related to parasite motility and were therefore not pursued further in this study.

      2. Please compare the expression of the tagged and complemented (cWT and cMut) CDPK1 with the endogenous expression of the non-tagged and non-complemented gene.

      We compared expression levels of CDPK1 using immunoblot with an anti-CDPK1 antibody comparing TIR1, CDPK1-AID, cWT and cMut parasites, which we have included in panel G of Figure 2–figure supplement 1. Endogenous AID tagging of CDPK1 resulted in a decrease in the abundance of CDPK1. cWT and cMut complementation result in similar expression levels to the AID-tagged iKD CDPK1, albeit the cMut complement has marginally higher expression. Since CDPK1 is essential for the lytic cycle, insufficient levels of the cWT expression would have displayed defects in our plaque assays. We have updated our results to reflect this new data:

      “Additionally, we compared endogenous CDPK1 expression to mAID-tagged, cWT, and cMut strain (Figure 2–figure supplement 1). Introduction of a mAID tag to CDPK1 led to a reduction in CDPK1 levels, but these levels were equivalent to complementation products in cWT and cMut parasites.”

      3. Please attempt to confirm that aerolysin treatment does not impact myristoylation-dependent subcellular partitioning of CDPK1.

      The kinase activity in aerolysin-treated parasites was unaffected by the 1B7 inhibitory nanobody, demonstrating that parasites remain impermeable to proteins as small as 15 kDa.  Furthermore, we localize CDPK1 by immunofluorescence in aerolysin-treated parasites to show that the localization of CDPK1 is indistinguishable from that of vehicle-treated parasites, suggesting that overall CDPK1 localization is unaffected by aerolysin treatment. We include this data in panel B in Figure 3–figure supplement 1. Nevertheless, in the manuscript we discuss the limitations of the thiophosphorylation experiment:

      “While our approach largely maintains kinases in their subcellular context, aerolysin treatment disrupts native ion concentrations and detaches the plasma membrane from the inner membrane complex (IMC) (Wichroski et al., 2002).”

      Because of these limitations we rely on the overlap of CDPK1-dependent targets between our thiophosphorylation and time course experiments.

      4. Please confirm the interaction of TGGT1_306920 and TGGT1_316650 with the HOOK and FTS proteins.

      In response to this suggestion, we tagged the C termini of TGGT1_306920 and TGGT1_316650 with 3xHA epitopes. Although immunoprecipitation of TGGT1_316650 was unsuccessful, immunoprecipitation of TGGT1_306920 identified HOOK and FTS as significantly enriched proteins. We include this new data in panel C of Figure 5 and have updated our results:

      “To further confirm the interaction, we fused a 3xHA tag to the C terminus of TGGT1_306920, performed IP-MS and compared protein enrichment to the HOOK-3xHA IP (Figure 5C). HOOK, FTS, and TGGT1_306920 were significantly enriched across both IP-MS experiments, whereas TGGT1_316650 is only significantly enriched in HOOK and FTS pulldowns. This suggests the presence of multiple HOOK complexes composed of the core HOOK and FTS proteins that bind with either TGGT1_316650 or TGGT1_306920.”

      While further interactions with other members of the complex still need to be validated it is not the standard of the field to validate every member of a protein complex by reciprocal IP. Our HOOK and FTS IP-MS results each identified HOOK, FTS, TGGT1_306920, and TGGT1_316650 and our TGGT1_306920 IP-MS identified all members except TGGT1_316650. These interaction partners were found significantly enriched compared to parental controls, which make the observation of the complex robust.

      Reviewer #1 (Recommendations For The Authors):

      I have only a few minor comments:

      1. In the supplemental data section I would include a document of code ( R script) used for the analysis. If this is too cumbersome then I would instead suggest that like done with proteomic data, the code should be deposited in a database that provides a DOI for access, instead of only being provided on request. This can be done by use of an electronic laboratory notebook or via Github.com or a similar service.

      Zip files containing R code and CSVs have been included for the sub-minute resolution phosphoproteomics (Supplementary File 11) and thiophosphorylation (Supplementary File 12).

      2. It would be useful to expand the discussion of the other 2 proteins identified in the HOOK complex TGGT1_316650 and 306920. Do these have homologs to proteins in other organisms? Based on HOOK in other eukaryotes can you provide a model of the 4 proteins in the complex that you identified? Was any work done on 316650 and 306920 with regards to genetic KO or auxin regulation to see if they also provided a similar phenotype to what was described with HOOK and FTS?

      We have included the following information in our discussion:

      “It also remains unknown how the HOOK complex binds to micronemes. In H. sapiens and D. melanogaster, RAB5 on vesicles interacts with FHIP in the HOOK complex(Bielska et al., 2014; Gillingham et al., 2014; Guo et al., 2016; Xu et al., 2008; Yao et al., 2014). We speculate that TGGT1_306920 may serve the role of FHIP within the HOOK complex as it is fitness conferring whereas TGGT1_316650 appears dispensable but the complex's binding partner on micronemes remains unknown. RAB5A and RAB5C have been implicated in the biogenesis of micronemes, but their roles during exocytosis have not been explored(Kremer et al., 2013). Understanding how micronemes are recognized may elucidate how cargo specificity is achieved and regulated.”

      TGGT1_306920 is conserved amongst coccidians and shares similar localization to HOOK and FTS. TGGT1_316650 is conserved amongst apicomplexans and more broadly in subsets of other eukaryotic phyla. Given our IP-MS data, HOOK and FTS form a core complex that is either bound to TGGT1_316650 or TGGT1_306920. Given that TGGT1_306920 appears to be important for parasite fitness, based on genome-wide screening data (Sidik, Huet, et al. 2016), we speculate this could function to mediate the linkage to microneme organelles. At this time, we have no additional data to present on 316650 and 306920. Additional biochemical studies will be needed to characterize the stoichiometry of complexes and their function; however, we propose that HOOK and FTS are interacting as previously described in opisthokonts (Bielska et al., 2014, Guo et al., 2016 and Zhang et al., 2014). 

      3. The myristoylation data section ended with "additional studies will be required to understand how myristoylation influences CDPK1 activity". What studies are required to further this understanding? I assume these studies are difficult and that is why they were not part of this outstanding paper.

      The effect of myristoylation is modest during acute phenotypes like egress (see Figure 2H). Moreover there were no significant differences between cWT and cMut that could explain the impact of CDPK1 on microneme secretion, which was the purpose of this study. Further studies would require a phosphoproteomic workup of the cWT and cMut, which is beyond the scope of the present study.

      4. In the key resource table, in the first column reagent type I suggest you indicate this as T. gondii RH strain to make it clear the background strain (I know it is encoded in additional information but the first column should also be clear).

      We have updated the key resources table to indicate the T. gondii strains used are of RH background.

      Reviewer #2 (Recommendations For The Authors):

      I have a few minor comments that could be addressed by modification of the current version of the manuscript.

      Line 290, where authors classify proteins phosphorylated in CDPK1 dependent manner into five groups, it would be helpful to list at least class 1 (five proteins) and class 2 (four proteins) in the text of the results section. Further since in the same paragraph, the authors are also describing figure 3G, it would be helpful if the groups are identified with roman numerals or as class A, B, C, D, and E. Currently, in fig 3G, the three columns (CDPK1 dependent, CDPK1 independent and fitness scores) are also identified as 1, 2 and 3 and these nomenclatures could be confused with the five different classes of putative substrates.

      We thank the reviewer for their helpful suggestion. We have renamed the classes of CDPK1 targets using roman numerals I, II, III, IV, and V. We have also listed out the proteins in Class I and Class II in the results section as follows:

      “Class I contains five proteins for which the same phosphorylated site was identified in both the time course and thiophosphorylation experiments and include: TGGT1_227610, TGGT1_221470, TGGT1_235160, TGGT1_273560 (KinesinB), and TGGT1_310060. Class II contains four proteins for which phosphorylated sites identified across both approaches were within 50 amino acid residues of one another and include: TGGT1_289100 (MIC18), TGGT1_309190 (AIP), TGGT1_254870, and TGGT1_259630.”

      Line 398, the expansions of the abbreviations FTS and FHIP should be included.

      We have included the expansions of the abbreviations for FTS and FHIP:

      “In D. melanogaster and mammals, HOOK proteins have been shown to form dimers and bind Fused Toes (FTS) and FTS and HOOK-interacting protein (FHIP) via a C-terminal region that interacts with vesicular cargo (Christensen et al., 2021; Krämer and Phistry, 1996; Lee et al., 2018; Xu et al., 2008).”

      The HOOK protein shows CDPK1-dependent phosphorylation at multiple sites S167, S177, and S189-191. In the discussion section, it would be helpful if the authors can speculate about the importance of these phosphorylated residues on the functioning of HOOK.

      Prior to engaging parasite motility, micronemes are positioned at the apical third of the parasite, but after an increase in intracellular Ca_2+_, micronemes rapidly traffic to the apical tip of the parasite. Our results indicate that both CDPK1 kinase activity and HOOK are required for microneme trafficking. Given the association of micronemes with tubulin-based structures such as the cortical microtubules and conoid, activation of trafficking along such structures must be rapid, on the time scale of seconds. Cell-free reconstitution assays generated from opisthokonts indicate that activating adaptors like HOOK are necessary to activate processive dynein trafficking along microtubules in addition to conferring cargo selectivity. In intracellular non-motile parasites, HOOK is expressed and localized to the apical end and cytosol prior to the activation of rapid microneme trafficking, consistent with regulation of HOOK activity. We have included reference to this type of regulation and our expectation that CDPK1 activates the HOOK complex as part of the Discussion:

      “Phosphorylation has been reported to regulate the function of activating adaptors. In HeLa cells, phosphorylation of BICD2 facilitates recruitment of dynein and dynactin (Gallisà-Suñé et al. 2023). Analogously, phosphorylation of JIP1 mediates the switch between kinesin and dynein motility of autophagosomes in murine neurons (Fu et al. 2014). We therefore speculate that phosphorylation of HOOK by CDPK1 may activate the adaptor by promoting its interaction with dynein and dynactin to initiate trafficking of micronemes.”

      Reviewer #3 (Recommendations For The Authors):

      1. CDPK1 myristoylation. The loss of myristoylation of CDPK1 appears to increase its interaction with ROM4 which also becomes cytosolic instead of localizing to the plasma membrane. As ROM4 is necessary for microneme discharge after proteolysis it would be interesting to investigate the specific relation between CDPK1 and ROM4 and to confirm the interaction by reciprocal IP.

      Please see our response to Suggested Revision #1.

      2. CDPK1 myristoylation, Figure 2D. It would be useful to compare the expression of the tagged and complemented (cWT and cMut) CDPK1 with the endogenous expression of the non-tagged and non-complemented gene.

      Please see our response to Suggested Revision #2.

      3. Thiophosphorylation. The authors used the bacterial toxin aerolysin to semi-permeabilize parasite membranes by forming 3-nm pores. Aerolysin affects the membrane integrity, however, the authors demonstrated that CDPK1 is possibly associated with membrane structures (Figure 2E/F). Could it be possible to transiently destabilize the membrane before to treat with KTPγS or ATP? If not, it would be necessary to confirm that aerolysin treatment does not impact myristoylation-dependent subcellular partitioning of CDPK1 before identifying proteins specifically labelled by CDPK1G and not by CDPK1M (Figure 3C).

      Please see our response to Essential Revision #3.

      4. IP-MS on HOOK-3xHA parasites. The authors' results suggest that HOOK and FTS form a functional complex implicated in microneme exocytosis. It would be interesting to know if HOOK knockdown can have an effect on FTS expression or localization and reciprocally.

      While we agree with the reviewer that this is an interesting question, we focused this paper on the discovery of the complex in relation to CDPK1. Understanding the regulation and interaction of the complex components is the focus of ongoing work and will require generation of new strains and additional mass spectrometry. For those reasons we find these experiments fall beyond the scope of the present study.

      5. FTS-Turbo-ID. (Line 443-444) Authors should confirm the interaction of TGGT1_306920 and TGGT1_316650 with the HOOK and FTS proteins, it will give strength to their conclusion. In fact, without confirmation, everything is based on suggestions that were also formulated but not confirmed in humans. The physical existence of this putative complex should be demonstrated by co-IP experiments. In addition, the missing player is a dynein candidate itself, which leaves the model vulnerable. Short of pursuing this experimentally, it should at least be commented on in the Discussion.

      Please see our response to Sugegsted Revision #4. Our IP-MS experiments of HOOK-3xHA and FTS-3xHA indicate interactions with HOOK, FTS, TGGT1_316650, and TGGT1_306920. Our FTS-TurboID experiments also suggest an interaction between FTS, HOOK, TGGT1_316650 and TGGT1_306920. Furthermore, our TGGT1_306920 IP-MS data identifies HOOK and FTS, but not TGGT1_316650, suggesting distinct complexes with HOOK and FTS as core components.

      6. MIC2 secretion (Fig 5J). The rep represented by the grey dot with a white outline seems like an outlier result compared to the other 2 reps. Basically, without this rep there at least is a strong trend that there is a difference in secretion without EtOH stimulation. That is what actually would be expected, for constitutive secretion! Please carefully reconsider these data - e.g. check for outlier statistics and/or add reps.

      We present three independent biological replicates, showing a significant difference in microneme secretion following depletion of CDPK1, HOOK, or FTS. It is expected, based on our prior experience, that microneme secretion will vary day to day. However, the expected trend can be observed in all replicates. We are unclear what the reviewer means by constitutive secretion since some low-level of calcium-dependent microneme discharge is expected even in the absence of stimulation, barring BAPTA-AM treatment. Even in the absence of EtOH stimulation (left graph in Fig. 5J), the trend of diminished basal MIC2 release holds when CDPK1, HOOK, or FTS is knocked down.

      7. Apical accumulation of micronemes. A similar observation was made upon manipulation of Ferlin1, which is a manuscript on BioRXivs. Since other BioRXiv manuscripts are cited in the presented work, this is an omission.

      We apologize for this omission and have updated the manuscript accordingly:

      “It therefore appears that the initial round of microneme discharge during egress depends on CDPK1, and only subsequent rounds require the HOOK complex. Indeed, a fraction of micronemes are already found docked at the apical complex prior to the transition from the replicative to the motile stages, and may constitute the first round of microneme exocytosis (Mageswaran et al., 2021; Sun et al., 2022). Ferlin 1 (FER1) was recently shown to be involved in microneme positioning and overexpression of FER1 was sufficient to initiate an initial round of microneme exocytosis and induce egress (Tagoe et al. 2020).”

      Minor comments:

      1. Concerning the expression of the HOOK protein in Figures 4B, and C, could the author indicate why they performed the IFA after 24h of auxin treatment and the WB after 40h of treatment?

      The difference in timing was for technical reasons. Our immunoblots and additional assays such as microneme secretion require more parasites, such that we harvest at the end of the lytic cycle to increase yields. For the IFAs, we perform these at 24 hrs, which allows for depletion and replication, but captures parasites in small vacuoles that show clear localization patterns. Furthermore, our microneme relocalization studies in Figure 6 were also performed after 24 hrs of auxin treatment, yet exhibit a trafficking defect following  24 hr HOOK depletion.

      2. Fig 4H. The color of CDPK1-AID on the left and the HA on the top (HOOK) do correspond but indicate different proteins. Please label HOOK text in teal, not CDPK1.

      We have changed the text color of the strain names on 4H to black to avoid confusion with the IFA channel labels.

      3. I would like to suggest adding the "Key resources tables" in the supplementary data because it makes the materials & methods harder to read.

      The key resources table was included at the beginning of the Materials and Methods section as indicated in eLife’s instructions to the authors.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study presents a useful inventory of the joint effects of genetic and environmental factors on psychotic-like experiences, and identifies cognitive ability as a potential underlying mediating pathway. The data were analyzed using solid and validated methodology based on a large, multi-center dataset. However, the claim that these findings are of relevance to psychosis risk and have implications for policy changes are only partially supported by the results.

      We appreciate the feedback and insightful suggestions from the editor and reviewers, which aided us to improve the manuscript. We believe the concerns initially raised were mostly due to areas that needed further clarification, which we have now clarified in this revised version. Our primary contribution lies in our meticulous analytical approach aimed at minimizing confounding effects and providing more precise estimates of the genetic and environmental impact on children's cognition and psychology. This method differs from the widely used general linear modeling in the field, which, in our opinion, may not be the optimal strategy for large-scale data analysis. Our comprehensive, tutorial-style description of the methods might serve as a valuable resource for the community.

      Regarding the critique that our findings 'partially support the relevance to psychosis risk,' we have updated our manuscript to more accurately reflect this feedback. We have altered the narrative to indicate that psychotic-like experiences (PLE) are associated with the risk for psychosis, a connection substantiated by prior studies cited in our manuscript.

      Similarly, in response to the comment that our findings 'partially support implications for policy changes,' we have nuanced our conclusion. However, we would like to emphasize our discovery that a negative genetic predisposition impacting cognitive development (i.e., low polygenic scores for cognitive phenotypes) can be counteracted by a positive school and familial environment. We believe that this finding could have meaningful implication for policy making and is robustly supported by our analyses.

      We hope this revised manuscript more accurately reflects our research findings and its significances. Lastly, we would like to express our gratitude for your fair and detailed review process. Our experience working with eLife has been incredibly rewarding, and we commend your dedication to an encouraging and progressive publishing culture.  

      Public Reviews:

      Reviewer #1

      This study by Park et al. describes an interesting approach to disentangle gene-environment pathways to cognitive development and psychotic-like experiences in children. They have used data from the ABCD study and have included PGS of EA and cognition, environmental exposure data, cognitive performance data and self-reported PLEs. Although the study has several strengths, including its large sample size, interesting approach and comprehensive statistical model, I have several concerns:

      • The authors have included follow-up data from the ABCD Study. However, it is not very clear from the beginning that longitudinal paths are being explored. It would be very helpful if the authors would make their (analysis) approach clearer from the introduction. Now, they describe many different things, which makes the paper more difficult to read. It would be of great help to see the proposed path model in a Figure and refer to that in the Method.

      We clarified the longitudinal paths tested in this study in Intro [line 149~159]. We also added a figure of the proposed path model (Figure 1) [Methods: line 231~238].

      • There is quite a lot of causal language in the paper, particularly in the Discussion. My advice would be to tone this down.

      We adjusted and moderated the use of causal languages throughout the manuscript.

      • I feel that the limitation section is a bit brief, and can be developed further.

      We clearly specified the limitations of our study. These included concerns about the representativeness of the ABCD samples, of the limited scope of longitudinal data, and the use of non-randomized, observational data [line 524~544].

      • I like that the assessment of CP and self-reports PEs is of good quality. However, I was wondering which 4 items from the parent-reported CBCL were used and how did they correlate with the child-reported PEs? And how was distress taken into account in the child self-reported PEs measurement? Which PEs measures were used?

      Thanks for the clarification question. We report the Pearson’s correlation coefficients between the PLEs [line 198~200]. (The Reviewer #1 may have referred to the prior version of our manuscript submitted elsewhere, for this point has been already addressed in our initial submission to eLife).

      • What was the correlation between CP and EA PGSs?

      The Pearson’s correlation between CP and EA PGS was 0.4331 (p<0.0001). We added the statistics to the manuscript. [line 214]

      • Regarding the PGS: why focus on cognitive performance and EA? It should be made clearer from the introduction that EA is not only measuring cognitive ability, but is also a (genetic) marker of social factors/inequalities. I'm guessing this is one of the reasons why the EA PGS was so much more strongly correlated with PEs than the CP PGS. See the work bij Abdellaoui and the work by Nivard.

      We appreciate the reviewer’s insightful feedback. Acknowledging the role of both CP and EA PGSs in our study, we agree with the observation that EA PGS goes beyond gauging cognitive aptitude—it also serves as an indicator of societal influences and inequalities. The multifaceted nature of EA PGS could be the reason underlying the stronger correlation with PLEs compared to CP PGS. In response to this feedback, we revised our introduction to articulate the multifaceted role of EA PGS in more precise terms. For supporting our assertions, we have included references to prior studies (Abdellaoui et al., 2022) [line 131~142].

      Abdellaoui, A., Dolan, C. V., Verweij, K. J. H., & Nivard, M. G. (2022). Gene–environment correlations across geographic regions affect genome-wide association studies. Nature Genetics. doi:10.1038/s41588-022-01158-0

      • Considering previous work on this topic, including analyses in the ABCD Study, I'm not surprised that the correlation was not very high. Therefore, I don't think it makes a whole of sense to adjust for the schizophrenia PGS in the sensitivity analyses, in other words, it's not really 'a more direct genetic predictor of PLEs'.

      We thank the reviewer for the thoughtful comments. We acknowledge that the correlation between schizophrenia PGS and PLE may not be exceedingly high, as evidenced by previous work, including analyses from the ABCD study. However, we would like to emphasize our rationale for adjusting schizophrenia PGS in the sensitivity analyses. Our study design stemmed from the established associations between PLEs and increased risk for schizophrenia. Existing studies have reported significant associations between schizophrenia PGS and cognitive deficits in both psychosis patients (Shafee et al., 2018) and people at risk for psychosis (He et al., 2021). Notable, the PGS for schizophrenia has shown significant associations with PLEs, arguably more so than PGS for PLEs itself (Karcher et al., 2018). Our updated manuscript has incorporated these references to improve clarity. [line 307~309]. By adding this layer of adjustment, we believe that our mixed linear model more precisely examines the relationship between the cognitive phenotype PGS and PLEs, in terms of both sensitivity and specificity.

      He, Q., Jantac Mam-Lam-Fook, C., Chaignaud, J., Danset-Alexandre, C., Iftimovici, A., Gradels Hauguel, J., . . . Chaumette, B. (2021). Influence of polygenic risk scores for schizophrenia and resilience on the cognition of individuals at-risk for psychosis. Translational Psychiatry, 11(1). doi:10.1038/s41398-021-01624-z

      Karcher, N. R., Paul, S. E., Johnson, E. C., Hatoum, A. S., Baranger, D. A. A., Agrawal, A., . . . Bogdan, R. (2021). Psychotic-like Experiences and Polygenic Liability in the Adolescent Brain Cognitive Development Study. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. doi:https://doi.org/10.1016/j.bpsc.2021.06.012

      Shafee, R., Nanda, P., Padmanabhan, J. L., Tandon, N., Alliey-Rodriguez, N., Kalapurakkel, S., . . . Robinson, E. B. (2018). Polygenic risk for schizophrenia and measured domains of cognition in individuals with psychosis and controls. Translational Psychiatry, 8(1). doi:10.1038/s41398-018-0124-8

      • How did the FDR correction for multiple testing affect the results?

      Please note that we have clarified our FDR correction in the methods

      As detailed in the method section [line 254~255], we applied False Discovery Rate (FDR) correction for multiple testing across nine key variables in the study: PGS (CP or EA), family income, parental education, family’s financial adversity, Area Deprivation Index, years of residence, proportion of population below -125% of the poverty line, positive parenting behavior, and positive school environment. An exception was made in our additional sensitivity analysis, where we included schizophrenia PGS in the linear mixed model for adjustment, thus the FDR correction was applied across ten key variables instead. Overall, the application of FDR correction had minimal impact on our findings. Most associations between the key variables and the outcomes that were originally marked as highly significant sustained their significance after the FDR correction.

      Overall, I feel that this paper has the potential to present some very interesting findings. However, at the moment the paper misses direction and a clear focus. It would be a great improvement if the readers would be guided through the steps and approach, as I think the authors have undertaken important work and conducted relevant analyses.

      We express our appreciation to the reviewer for the positive feedback and constructive suggestions, which only serve to improve and strengthen our manuscript. We have incorporated the suggested corrections and clarifications in response to the reviewer's suggestions. We believe that these changes will not only enhance the overall readability but also more effectively emphasize the significance and implication of our work.

      Reviewer #2 (Public Review):

      This paper tried to assess the link between genetic and environmental factors on psychotic-like experiences, and the potential mediation through cognitive ability. This study was based on data from the ABCD cohort, including 6,602 children aged 9-10y. The authors report a mediating effect, suggesting that cognitive ability is a key mediating pathway in the link between several genetic and environmental (risk and protective) factors on psychotic-like experiences.

      While these findings could be potentially significant, a range of methodological unclarities and ambiguities make it difficult to assess the strength of evidence provided.

      Strengths of the methods:

      The authors use a wide range of validated (genetic, self- and parent-reported, as well as cognitive) measures in a large dataset with a 2-year follow-up period. The statistical methods have the potential to address key limitations of previous research.

      Weaknesses of the methods:

      The rationale for the study is not completely clear. Cognitive ability is probably a more likely mediator of traits related to negative symptoms in schizophrenia, rather than positive symptoms (e.g., psychosis, psychotic-like symptom). The suggestion that cognitive ability might lead to psychotic-like symptoms in the general population needs further justification.

      We appreciate the reviewer’s concern regarding the role of cognitive ability in relation to schizophrenia symptoms. We are aware that cognitive ability often serves as a mediator of psychotic-like experiences. However, to our best knowledge, a growing body of research has proposed that cognitive ability can mediate positive symptoms in schizophrenia including psychotic-like experiences. The studies by Howes & Murray (2014) and Garety et al. (2001) suggested that deficits in cognitive ability can potentially contribute to the manifestation of positive symptoms such as psychotic-like experiences. We have elaborated on this aspect in the Introduction section [line 104-115].

      Howes, O. D., & Murray, R. M. (2014). Schizophrenia: an integrated sociodevelopmental-cognitive model. The Lancet, 383(9929), 1677-1687. doi:https://doi.org/10.1016/S0140-6736(13)62036-X

      Garety, P. A., Kuipers, E., Fowler, D., Freeman, D., & Bebbington, P. E. (2001). A cognitive model of the positive symptoms of psychosis. Psychological Medicine, 31(2), 189-195. doi:10.1017/S0033291701003312

      Terms are used inconsistently throughout (e.g., cognitive development, cognitive capacity, cognitive intelligence, intelligence, educational attainment...). It is overall not clear what construct exactly the authors investigated.

      We thank the reviewer’s feedback regarding the consistency of terminology in our manuscript. Per the suggestion, we standardized the use of ‘cognitive capacity’ and now consistently refer to it as ‘cognitive phenotypes’ throughout our manuscript. Furthermore, we explicitly stated in the Introduction section that our two PGSs of focus will be termed ‘cognitive phenotypes PGSs’, aligning with terminology used in prior studies (Joo et al., 2022; Okbay et al., 2022; Selzam et al., 2019) [line 140~142].

      Joo, Y. Y., Cha, J., Freese, J., & Hayes, M. G. (2022). Cognitive Capacity Genome-Wide Polygenic Scores Identify Individuals with Slower Cognitive Decline in Aging. Genes, 13(8), 1320. doi:10.3390/genes13081320

      Okbay, A., Wu, Y., Wang, N., Jayashankar, H., Bennett, M., Nehzati, S. M., . . . Young, A. I. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics, 54(4), 437-449. doi:10.1038/s41588-022-01016-z

      Selzam, S., Ritchie, S. J., Pingault, J.-B., Reynolds, C. A., O’Reilly, P. F., & Plomin, R. (2019). Comparing Within- and Between-Family Polygenic Score Prediction. The American Journal of Human Genetics, 105(2), 351-363. doi:https://doi.org/10.1016/j.ajhg.2019.06.006

      Not the largest or most recent GWASes were used to generate PGSes.

      We appreciate the reviewer’s observation. Indeed, we were unable to utilize the most recent or the largest GWAS for cognitive performance, educational attainment, and schizophrenia due to the timeline of our study. Regrettably, the commencement of our study preceded the publication of the ‘currently’ the largest or most recent GWAS studies by Okbay et al. (2022) and Trubetskoy et al. (2022). Our research was conducted with the best available data at that time, which was the GWAS of European-descent individuals for educational attainment and cognitive performance (Lee et al, 2018). To eliminate any potential confusion, we adjusted the text to specify that our study used 'a GWAS of European-descent individuals for educational attainment and cognitive performance' rather than the largest GWAS [line 206~208].

      It is not fully clear how neighbourhood SES was coded (higher or lower values = risk?). The rationale, strengths, and assumptions of the applied methods are not fully clear. It is also not clear how/if variables were combined into latent factors or summed (weighted by what). It is not always clear when genetic and when self-reported ethnicity was used. Some statements might be overly optimistic (e.g., providing unbiased estimates, free even of unmeasured confounding; use of representative data).

      Thank you for pointing this out. Consistent with the illustration of neighborhood SES in the Methods, higher values of neighborhood SES indicate risk [line 217~228]. In the original Figure 2, higher value of neighborhood SES links to lower intelligence (direct effects: β=-0.1121) and higher PLEs (indirect effects: β=-0.0126~ -0.0162). We think such confusion might have been caused by the difference between family SES (higher values = lower risk) neighborhood SES (higher values = higher risk). Thus, we changed the terms to ‘High Family SES’ and ‘Low Neighborhood SES’ in the corrected figure (Figure 3) for clarification.

      Considering that shorter duration of residence may be associated with instability of residency, it may indicate neighborhood adversity (i.e., higher risk). This definition of the ‘years of residence’ variable is in line with the previous study by Karcher et al. (2021).

      During estimation, the IGSCA determines weights of each observed variable in such a way as to maximize the variances of all endogenous indicators and components. We added this explanation in the description about the IGSCA method [line 266~268].

      We deleted overly optimistic statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead, throughout our manuscript.

      Karcher, N. R., Schiffman, J., & Barch, D. M. (2021). Environmental Risk Factors and Psychotic-like Experiences in Children Aged 9–10. Journal of the American Academy of Child & Adolescent Psychiatry, 60(4), 490-500. doi:10.1016/j.jaac.2020.07.003

      It appears that citations and references are not always used correctly.

      We thoroughly checked all citations and specified the references for each statement: We deleted Plomin & von Stumm (2018) and Harden & Koellinger (2020) and cited relevant primary studies (e.g., Lee et al., 2018; Okbay et al., 2022; Abdellaoui et al., 2022) instead. We also specified the references supporting the statement that educational attainment PGS links to brain morphometry (Judd et al., 2020; Karcher et al., 2021). As Okbay et al. (2022) use PGS of cognitive intelligence (which mentions the analyses results in their supplementary materials) as well as educational attainment, we decided to continue citing this reference [line 131~141].

      Strengths of the results:

      The authors included a comprehensive array of analyses.

      We thank the reviewer for the positive comment.

      Weaknesses of the results:

      Many results, which are presented in the supplemental materials, are not referenced in the main text and are so comprehensive that it can be difficult to match tables to results. Some of the methodological questions make it challenging to assess the strength of the evidence provided in the results.

      As you rightly identified, we inadvertently failed to reference Table S2 in the main text. We have since corrected this omission in the Results section for the IGSCA (SEM) analysis [line 376]. The remainder of the supplementary tables (Table S1, S3~S7) have been appropriately cited in the main manuscript. We recognize that the quantity of tables provided in the supplementary materials is substantial. However, given the comprehensiveness and complexity of our analyses, which encompass a wide array of study variables, these tables offer intricate results from each analysis. We deem these results, which include valuable findings from sensitivity analyses and confound testing, too significant to exclude from the supplementary materials. That said, we are open to, and would greatly welcome, any further suggestions on how to present our supplementary results in a more clear and digestible format. Your guidance in this matter is highly valued.

      Appraisal:

      The authors suggest that their findings provide evidence for policy reforms (e.g., targeting residential environment, family SES, parenting, and schooling). While this is probably correct, a range of methodological unclarities and ambiguities make it difficult to assess whether the current study provides evidence for that claim.

      We believe that with the improvement we made in this revised manuscript, this concern may have been successfully mitigated.

      Impact:

      The immediate impact is limited given the short follow-up period (2y), possibly concerns for selection bias and attrition in the data, and some methodological concerns.

      We appreciate the feedback provided in the reviewer's impact statement. We added as study limitations [line 524~544] that the impact of our findings may be limited due to the relatively short follow-up period, the possibility of sample selection bias, and the problems of interpreting results from an observational study as causality (despite the novel causal inference methods, designed for non-randomized, observational data, that we used).

      As responded above (and also in more detail in the Reviewer #2’s Recommendations For The Authors section below), we made necessary corrections and clarifications for the points suggested by the reviewer. As we are willing to make additional revisions, please feel free to give comments if you feel that our corrections are insufficient or inappropriate.

      Nevertheless, we would like to discuss some points. We sincerely hope this following response does not come across as argumentative to the reviewer and the editor. We fully understand the reviewer's perspective on this matter, and we agree that the issues raised about the ABCD study are absolutely valid. However, when evaluating the overall impact of a study, other factors, such as how the field has been assessing the impact of similar studies, should also be considered.

      Firstly, the potential selection bias and attrition in the ABCD data may not necessarily limit the conclusions of this study. While recognizing the potential issues with the ABCD data is important, we feel that judging the impact of our findings as "limited" based on these issues may not be entirely fair. This is because no study, particularly those of a nationwide scale such as the UK Biobank, IMAGEN, HEAL, HBCD, etc., is completely free of limitations. Typically, the potential limitations of the data don't undermine the impact of individual studies' findings. Numerous studies using ABCD data have been published in top-tier journals—despite the limitations of the ABCD study—underscoring the scientific merit of the findings. For example, the study by Tomasi, D., & Volkow, N. D. (2021), entitled "Associations of family income with cognition and brain structure in USA children: prevention implications," published in Molecular Psychiatry, might be highly relevant to the limitations of the ABCD study raised by the reviewer. The scientific community, including editors, reviewers, and readers, may have appreciated the impact of this study despite the acknowledged limitations of the ABCD data.

      Secondly, the two-year time window of our longitudinal analysis might not impact the aim of this study—an iterative assessment of the associations between genetic and environmental variables with cognitive intelligence and mental health, with a focus on PLE, in preadolescents. Had we aimed to test the developmental trajectory from childhood to adolescence, perhaps a longer timeframe would have made more sense. So, we do not agree with the reviewer’s assessment that the short time window limits the impact of our study.

      Suggested revisions based on the combined reviewer feedback:

      1) The terminology used should be carefully reviewed and revised

      • Please use the correct terminology for the key concepts assessed in this study. For example, authors sometimes conflate PLEs and psychosis, two related but separate constructs. Furthermore, the terms 'good parenting' and 'good schooling' are vague and subjective.

      • The authors use multiple terms to refer to cognitive ability (cognitive capacity, intelligence, cognitive intelligence, etc). The term 'cognitive development' in the title and manuscript does not seem to be justified given the focus on different measures of cognitive ability at a single time point (i.e. baseline).

      • Please avoid causal language and using statements that cannot be entirely substantiated (e.g. unbiased estimates, free from unmeasured confounding)

      Thank you for suggesting this point. We revised all key terminologies used throughout our manuscript.

      Per your suggestion, we specified that PLEs indicate the risk of psychosis and often precede schizophrenia. We checked all misused cases of the term ‘psychosis’ and corrected them as ‘PLEs’. We also changed the terms 'good parenting' and 'good schooling' to ‘positive parenting behavior’ and ‘positive school environment’.

      We changed the term ‘cognitive development’ to ‘cognitive ability’ throughout our manuscript. We also changed the title to ‘Gene-Environment Pathways to Cognitive Intelligence and Psychotic-Like Experiences in Children’ because we used ‘cognitive intelligence’ for NIH toolbox variable in the text.

      We corrected and tone-downed all causal languages used in our manuscript. As mentioned by the reviewers, we deleted statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead.

      2) A stronger rationale for the focus on PLEs, and the potential mediating role of cognitive ability in genetic and environmental effects on PLES, should be provided

      We appreciate the raised concerns that cognitive ability may serve as a mediator of psychotic-like experiences. To our best knowledge, it has been proposed that cognitive ability can be a mediator of positive symptoms in schizophrenia (including psychotic-like experiences), as well as negative symptoms. This mediating role of cognitive ability was proposed in several prior studies on cognitive model of schizophrenia/psychosis. Per your suggestion, we included an additional justification in Intro [line 104~115] where we highlighted that cognitive ability has been proposed as a potential mediator of genetic and environmental influence on positive symptoms of schizophrenia such as psychotic-like experiences. We refer to studies conducted by Howes & Murray (2014) and Garety et al. (2001).

      Howes, O. D., & Murray, R. M. (2014). Schizophrenia: an integrated sociodevelopmental-cognitive model. The Lancet, 383(9929), 1677-1687. doi:https://doi.org/10.1016/S0140-6736(13)62036-X

      Garety, P. A., Kuipers, E., Fowler, D., Freeman, D., & Bebbington, P. E. (2001). A cognitive model of the positive symptoms of psychosis. Psychological Medicine, 31(2), 189-195. doi:10.1017/S0033291701003312

      3) As described in more detail by the reviewers, more information should be provided about the measures used in the study and how they relate to one another (e.g. correlations between PQ-BC and CBCL; PGS-CA and PGS-EA).

      Thank you for your suggestion. Although this information was already provided in our initial submission, it appears that the Reviewer #1’s might have referred to the prior version of our manuscript submitted elsewhere before eLife.

      To clarify, our findings reveal significant Pearson’s correlation coefficients between PLEs across all time-points (baseline year: r=0.095~0.0989, p<0.0001; 1-year follow-up: r=0.1322~0.1327, p<0.0001; 2-year follow-up: r= 0.1569~0.1632, p<0.0001) and we added this information in the Method section [line 198~200]. We also added the Pearson’s correlation between the two PGSs (r=0.4331, p<0.0001) in the Methods for PGS [line 214].

      4) More details are needed regarding the analytical strategies used (e.g. how imputation was performed, why PGS were not based on the largest and most recent GWASes, whether latent or observed variables were examined, what exactly the supplementary materials show and how they relate to information provided in the main text).

      We appreciate your feedback. We acknowledge the concerns about the GWAS sources utilized for the study. Unfortunately, our study commenced prior to the publication of the ‘currently’ most recent or largest GWAS by Okbay et al. (2022) and Trubetskoy et al. (2022). Our research was conducted with the best available data at that time, which was the largest GWAS of European-descent individuals for educational attainment and cognitive performance (Lee et al, 2018). We have now clarified this point in the manuscript. [line 206~208]

      Also, we specified the use of composite indicators for the PGS, family SES, neighborhood SES, positive family and school environment, and PLEs, while latent factors were used for cognitive intelligence [line 269~285].

      We highly appreciate the reviewer’s comments regarding the supplementary materials. We regret overlooking the citation of Table S2 in the main manuscript, and this has now been rectified in the Results section for the IGSCA (SEM) analysis [line 376]. The remaining supplementary tables (Table S1, S3~S7) have been correctly referenced within the manuscript. We acknowledge that the supplementary materials are extensive due to the comprehensive array of study variables and intricate results from each analysis. However, given that our analyses encompass a wide array of study variables, these tables offer intricate results from each analysis. We deem these results, which include valuable findings from sensitivity analyses and confound testing, too crucial to exclude from the supplementary materials. That said, we are open to any further suggestions to make our supplementary results more accessible and digestible. In order to improve the accessibility and clarity of our presentation, we are fully committed to making any necessary changes and look forward to any further recommendations.

      5) The limitation section should be expanded and statements regarding the implications of the study findings should be qualified accordingly (e.g. short follow-up period, potential for attrition and selection bias, reverse causation, etc)

      We specified additional potential constraints of our study, including limited representativeness, limited periods of follow-up data (baseline year, 1-year, and 2-year follow-up), possible sample selection bias, and the use of non-randomized, observational data [line 524~544].

      6) Please ensure that the references provided support the statements in the text to which they are linked to.

      Thank you for pointing this out. We thoroughly went over all citations and corrected the inaccurately or vaguely cited references for each statement.

      Reviewer #2 (Recommendations For The Authors):

      1) Please use terms consistently and correctly. E.g., 'cognitive capacity' is not the same as 'educational attainment'.

      We thank the reviewer’s feedback regarding the consistency of terminology in our manuscript. Per the suggestion, we standardized the use of ‘cognitive capacity’ and now consistently refer to it as ‘cognitive phenotypes’ throughout our manuscript. Furthermore, we explicitly stated in the Introduction section that our two PGSs of focus will be termed ‘cognitive phenotypes PGSs’, aligning with terminology used in prior studies (Joo et al., 2022; Okbay et al., 2022; Selzam et al., 2019) [line 140~142].

      Joo, Y. Y., Cha, J., Freese, J., & Hayes, M. G. (2022). Cognitive Capacity Genome-Wide Polygenic Scores Identify Individuals with Slower Cognitive Decline in Aging. Genes, 13(8), 1320. doi:10.3390/genes13081320

      Okbay, A., Wu, Y., Wang, N., Jayashankar, H., Bennett, M., Nehzati, S. M., . . . Young, A. I. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics, 54(4), 437-449. doi:10.1038/s41588-022-01016-z

      Selzam, S., Ritchie, S. J., Pingault, J.-B., Reynolds, C. A., O’Reilly, P. F., & Plomin, R. (2019). Comparing Within- and Between-Family Polygenic Score Prediction. The American Journal of Human Genetics, 105(2), 351-363. doi:https://doi.org/10.1016/j.ajhg.2019.06.006

      2) The authors study 'cognitive performance using seven instruments', but it is not clear how fluid and crystalline intelligence was defined/operationalized.

      Thank you for pointing this out. We specified the NIH Toolbox tests used for composite scores of fluid and crystallized intelligence, respectively. “We utilized baseline observations of uncorrected composite scores of fluid intelligence (Dimensional Change Card Sort Task, Flanker Test, Picture Sequence Memory Test, List Sorting Working Memory Test), crystallized intelligence (Picture Vocabulary Task and Oral Reading Recognition Test), and total intelligence (all seven instruments) provided in the ABCD Study dataset” [line 180~187].

      3) I don't think Lee 2018 is the largest GWAS for educational attainment. That would be Okbay 2022. It needs to be described how cognitive performance was defined in Lee 2018. Why did the authors not use the Trubetskoy 2022 schizophrenia GWAS?

      Thank you for mentioning this point. The reason why we were not able to use the largest GWAS for CP, EA and schizophrenia is because (unfortunately) our study started earlier than the point when the GWAS studies by Okbay et al. (2022) and Trubetskoy et al. (2022) were published. We corrected that our study used ‘a GWAS of European-descent individuals for educational attainment and cognitive performance’ instead of the largest GWAS [line 206~208].

      4) It is unclear how neighbourhood SES was coded. The authors seem to suggest that higher values indicate risk, but Figure 2 suggests that higher values links to higher intelligence and lower PLE.

      Thank you very much for pointing this out. Consistent with the illustration of neighborhood SES in the Methods section, higher values of neighborhood SES indicate risk. In the original Figure 2, higher values of neighborhood SES links to lower intelligence (direct effects: β=-0.1121) and higher PLEs (indirect effects: β=-0.0126~-0.0162). We think such confusion might have been caused by the difference between family SES (higher values = lower risk) neighborhood SES (higher values = higher risk). Thus, we changed the terms to ‘High Family SES’ and ‘Low Neighborhood SES’ in the corrected figure (Figure 3) for clarification.

      5) Also, the 'year of residence' variable is unclearly defined. Does this mean that a shorter duration of residency (even in a good neighbourhood) indicate risk?

      Thank you for mentioning this point. Considering that shorter duration of residence may be associated with instability of residency, it may indicate neighborhood adversity (i.e., higher risk). This definition of the ‘years of residence’ variable is in line with the previous study by Karcher et al. (2021).

      Karcher, N. R., Schiffman, J., & Barch, D. M. (2021). Environmental Risk Factors and Psychotic-like Experiences in Children Aged 9–10. Journal of the American Academy of Child & Adolescent Psychiatry, 60(4), 490-500. doi:10.1016/j.jaac.2020.07.003

      6) Please provide information on how correlated the two PGSes were.

      Thank you for your suggestion. We added the Pearson’s correlation between the two PGSs (r=0.4331, p<0.0001) in the Methods section for PGS [line 214].

      7) Information on the outcome variable in the 'linear mixed models' section is missing. I assumed it was PLE.

      Thank you for notifying us of this point. We added the information on the outcome variables in the section for linear mixed models [line 242~244].

      8) In the 'Path Modeling' section, please explain what 'factors and components' concretely refer to. How is this different from a standard SEM with latent factors?

      Thank you for your comment on the need to elaborate the IGSCA method. We added that different from standard SEM methods which only uses latent factors, the IGSCA method can use components as well as latent factors as constructs in model estimation. This allows the IGSCA method to control bias more effectively in estimation compared to the standard SEM [line 261~268].

      9) The sentence starting line 229 is unclear. Does this mean variables were not used to generate latent factors. And if not, what weights were used to create a 'weighted sum'?

      Thank you for mentioning this point. The sentence means that we treated PGSs, family SES, neighborhood SES, positive family and school environment, and PLEs as composite indicators (derived from a weighted sum of relevant observed variables), while general intelligence was represented as a latent factor.

      It has been suggested from prior studies that these variables (PGSs, family SES, neighborhood SES, positive family and school environment, and PLEs) are less likely to share a common factor and were assessed as a composite index during analyses. For instance, Judd et al. (2020) and Martin et al. (2015) analyze genetic influence of educational attainment and ADHD as composite indicators. Also, as mentioned in Judd et al. (2020), socioenvironmental influences are often analyzed as composite indicators. Studies on psychosis continuum (e.g., van Os et al., 2009) suggest that psychotic disorders are likely to have multiple background factors instead of having a common factor, and notes that numerous prior research uses composite indices to measure psychotic symptoms. Based on this literature, we used components for these variables.

      The IGSCA determines weights of each observed variable to maximize the variances of the endogenous indicators and components [added in line 265~268].

      On the other hand, we treated general intelligence as a latent factor/variable underlying fluid and crystallized intelligence. This is based on the extensive literature of classical g theory of intelligence [added in line 269~284].

      Judd, N., Sauce, B., Wiedenhoeft, J., Tromp, J., Chaarani, B., Schliep, A., ... & Klingberg, T. (2020). Cognitive and brain development is independently influenced by socioeconomic status and polygenic scores for educational attainment. Proceedings of the National Academy of Sciences, 117(22), 12411-12418.

      Martin, J., Hamshere, M. L., Stergiakouli, E., O'Donovan, M. C., & Thapar, A. (2015). Neurocognitive abilities in the general population and composite genetic risk scores for attention‐deficit hyperactivity disorder. Journal of Child Psychology and Psychiatry, 56(6), 648-656.

      van Os, J., Linscott, R., Myin-Germeys, I., Delespaul, P., & Krabbendam, L. (2009). A systematic review and meta-analysis of the psychosis continuum: Evidence for a psychosis proneness–persistence–impairment model of psychotic disorder. Psychological Medicine, 39(2), 179-195. doi:10.1017/S0033291708003814

      10) It is overall not clear when genetically and when self-reported information of ethnicity was used. This needs to be clearer throughout.

      Thank you for mentioning this point. We only used genetically defined ethnicity, and we have not mentioned that we used self-reported ethnicity. Per your suggestion, we clarified that we used ‘genetic ethnicity’ throughout the paper.

      11) The sentence starting line 253 is also unclear. How is schizophrenia PGS a 'more direct genetic predictor of PLE' and compared to what other measure?

      Thank you for pointing this out. Please note that our adjustment (or sensitivity analyses) was based on the reported associations between PLEs and the risk for schizophrenia: schizophrenia PGS is associated with a cognitive deficit in psychosis patients (Shafee et al., 2018) and individuals at-risk of psychosis (He et al., 2021), and psychotic-like experiences (more so than PGS for psychotic-like experiences) (Karcher et al., 2018). We added these references for clarification [line 307~309]. We believe that because of the adjustment our results from the mixed linear model show the sensitivity and specificity of the association between cognitive phenotype PGS and PLEs.

      He, Q., Jantac Mam-Lam-Fook, C., Chaignaud, J., Danset-Alexandre, C., Iftimovici, A., Gradels Hauguel, J., . . . Chaumette, B. (2021). Influence of polygenic risk scores for schizophrenia and resilience on the cognition of individuals at-risk for psychosis. Translational Psychiatry, 11(1). doi:10.1038/s41398-021-01624-z

      Karcher, N. R., Paul, S. E., Johnson, E. C., Hatoum, A. S., Baranger, D. A. A., Agrawal, A., . . . Bogdan, R. (2021). Psychotic-like Experiences and Polygenic Liability in the Adolescent Brain Cognitive Development Study. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. doi:https://doi.org/10.1016/j.bpsc.2021.06.012

      Shafee, R., Nanda, P., Padmanabhan, J. L., Tandon, N., Alliey-Rodriguez, N., Kalapurakkel, S., . . . Robinson, E. B. (2018). Polygenic risk for schizophrenia and measured domains of cognition in individuals with psychosis and controls. Translational Psychiatry, 8(1). doi:10.1038/s41398-018-0124-8

      12) Please include a statement on the assumptions made when using the method used in this study and developed by Miao 2022, explain what evidence you have to support these assumptions and how this method, which I believe was developed for RCTs, can be applied to observational data.

      We specified the assumptions for the causal inference method proposed by Miao et al. (2022) and why it is applicable to our study. Also, we noted that this novel method was developed to identify the causal effects of multiple treatment variables within non-randomized, observational data [line 309~319].

      13) Some of the statements are potentially misleading. E.g., I would be very cautious to claim that the methods applied allowed the authors to estimate 'unbiased associations again potential (even unobserved) confounding variables'. There are many concerns such as selection bias, attrition, reverse causation, genetic confounding, etc that cannot be addressed satisfactorily using these data and methods.

      Thank you for pointing this out. We deleted statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead.

      Nevertheless, please note that due to some limitations in the data (e.g., confounders), an analytic approach should be robust enough to handle potential violations of assumptions. This was the point we wanted to emphasize--In contrast to the majority of studies using the ABCD study, which employ simplistic GLM or conventional SEM with only latent variable modeling, our study provides less biased, thus more accurate, estimates through the use of sophisticated modeling for confounding effects (instead of simplistic GLM) and IGSCA (instead of conventional simplistic SEM). We hope our study may help improve our analytical approach in this field.

      14) I would be equally cautious to claim that the ABCD study is representative. Please add information on the whole ABCD cohort to Table 1 and describe any relevance with respect to attrition effects or representativeness.

      Thank you for highlighting this issue. We previously characterized the ABCD Study as representative of the US population, given its aim to ensure representativeness by recruiting from a broad range of school systems located near each of its 21 research sites, chosen for their geographic, demographic, and socioeconomic diversity. Using epidemiological strategies, a stratified probability sample of schools was selected for each site. This procedure took into account sex, race/ethnicity, socioeconomic status, and urbanicity to reduce potential sampling biases at the school level. Based on these strategies, previous research (e.g., Thompson et al., 2019; Zucker et al., 2018) has referred to the ABCD Study as ‘representative.’ However, we overlooked the fact that “not all 9-year-old and 10-year-old children in the United States had an equal chance of being invited to participate in the study,” and therefore, it should not be deemed fully representative of the US population (Compton et al., 2019). Heeding your suggestion, we have removed all descriptions of the ABCD Study being representative.

      Compton, W. M., Dowling, G. J., & Garavan, H. (2019). Ensuring the Best Use of Data: The Adolescent Brain Cognitive Development Study. JAMA Pediatrics, 173(9), 809-810. doi:10.1001/jamapediatrics.2019.2081

      Thompson, W. K., Barch, D. M., Bjork, J. M., Gonzalez, R., Nagel, B. J., Nixon, S. J., & Luciana, M. (2019). The structure of cognition in 9 and 10 year-old children and associations with problem behaviors: Findings from the ABCD study’s baseline neurocognitive battery. Developmental Cognitive Neuroscience, 36, 100606. doi:10.1016/j.dcn.2018.12.004

      Zucker, R. A., Gonzalez, R., Feldstein Ewing, S. W., Paulus, M. P., Arroyo, J., Fuligni, A., . . . Wills, T. (2018). Assessment of culture and environment in the Adolescent Brain and Cognitive Development Study: Rationale, description of measures, and early data. Developmental Cognitive Neuroscience, 32, 107-120. doi:https://doi.org/10.1016/j.dcn.2018.03.004

      15) The imputation methods need to be explained in more detail / more clearly. What concrete variables were included? Why was 50% of the sample excluded despite imputation? How similar is the study sample to the overall ABCD cohort - and to the US population in general (i.e., is this a representative dataset)?

      Thank you for mentioning this point. We clarified the method and detailed processes of the imputation (e.g., R package VIM, number of missing observations for each study variables such as genotypes, follow-up observations, and positive environment) [Methods; line 167~176].

      The final samples had significantly higher cognitive intelligence, parental education, family income, and family history of psychiatric disorders, lower Area Deprivation Index, percentage of individuals below -125% of the poverty level, and family’s financial adversity (p<0.05). As you have noted above, these results also show the limited representativeness of the data used in our study. We fully acknowledge that our study sample, as well as the overall ABCD cohort, is not representative of the US population in general.

      16) There are a range of unclear statements (e.g., 'Supportive parenting and a positive school environment had the largest total impact on PLEs than genetic or environmental factors' - isn't parenting an environmental factor?).

      Thank you for mentioning this point. We clarified seemingly vague expressions and unclear statements. We corrected the sentence you noted as ‘Supportive parenting and a positive school environment had the largest total impact on PLEs than any other genetic or environmental factors’ [line 57~58].

      17) The authors' conclusion (that these findings have policy implications for improving school and family environmental) are not fully supported by the evidence. E.g., genetic effects were equally large.

      Thank you for pointing this out. Our description should be clearer. Our models consistently show that the combined environmental effects of positive family/school environment, and family/neighborhood SES exceeds the genetic effects. We suggest that these findings may have policy implications for “improving the school and family environment and promoting local economic development” [line 62~64].

      To clarify, we newly added “Despite the undeniable genetic influence on PLEs, when we combine the total effect sizes of neighborhood and family SES, as well as positive school environment and parenting behavior (∑▒〖|β|〗=0.2718~0.3242), they considerably surpass the total effect sizes of cognitive phenotypes PGSs (|β|=0.0359~0.0502)” [line 510~513]. Based on these results, we suggest that our findings hold potential policy implications for “preventative strategies that target residential environment, family SES, parenting, and schooling—a comprehensive approach that considers the entire ecosystem of children's lives—to enhance children's cognitive ability and mental health” in the Discussion [line 507~510].

      Admittedly, our results do not directly demonstrate a causal effect wherein an intervention in the school or family environmental variables would necessarily lead to a significantly meaningful positive impact on a child's cognitive intelligence and mental health. We do not make such a claim in this paper. However, we anticipate that further integrative analyses akin to ours might help identify potential causal or prescriptive effects. We hope this perspective will be recognized as one of the contributions of our study. We leave the final decision to the discerning judgment of the editors and reviewers.

      18) Many citations do not support the statements made and are sometimes used rather vaguely. For example, I believe Judd 2020 and Okbay 2022 did not use a PGS of cognitive capacity, but of educational attainment. Plomin 2018 and Harden 2020 are reviews, but the primary studies should be cited instead. Which reference exactly is supporting the statement that cognitive capacity PGS links to brain morphometry?

      Thank you very much for your precise observations. We thoroughly checked all citations and updated the references for each statement.

      We deleted Plomin & von Stumm (2018) and Harden & Koellinger (2020) and cited relevant original research articles (e.g., Lee et al., 2018; Okbay et al., 2022; Abdellaoui et al., 2022) instead. We also specified the references supporting the statement that educational attainment PGS links to brain morphometry (Judd et al., 2020; Karcher et al., 2021). As Okbay et al. (2022) used the PGS of cognitive intelligence (which presented the analyses results in their supplementary materials) as well as educational attainment, we decided to continue citing this reference [line 131~141].

      19) Citations are formatted inconsistently.

      We apologize for the inconsistency of the citation formatting. We formatted all citations in APA 7th style, using EndNote v20. We checked that all citations maintain consistency according to the reference style.

      20) Re line 281, I believe effect sizes are 'up to twice as large', but not consistently twice as large as suggested in the text.

      Thank you for mentioning this point. We corrected the sentence as ‘The effect sizes of EA PGS on children's PLEs were larger than those of CP PGS’ [line 342~343].

      21) Please add to the results a short statement on what covariates these analyses were controlled for.

      Thank you for giving us this comment. We added that we used sex, age, marital status, BMI, family history of psychiatric disorders, and ABCD research sites as covariates in the Results section [line 329~331].

      22) Cho 2020 does not provide recommendations on FIT values (line 315). Please provide another reference and explain how these FIT values should be interpreted.

      Thank you for mentioning this point. We added the correct reference for FIT values (Hwang, Cho, & Choo, 2021). We also added that the FIT values range from 0 to 1, and a larger FIT value indicates more variance of all variables is explained by the specified model (e.g., FIT=0.50 denotes that the model explains 50% of the total variance of all variables) [line 291~293].

      23) Regarding Figure 2, please add factor loadings to this figure and explain what the difference between the hexagon and circular shapes are. Please also add the autocorrelations between the 3 PLE measures. I assume these were also modelled statistically, given the strong correlations between time points?

      Figure 2B needs reworking.

      It is unclear what the x-axis of Figure 2C represents. Proportion of R2 or effect size? SM table 2 provides key information, which should be added to Figure 2.

      Thank you for pointing this out. We added factor loadings to the corrected figure (Figure 3A and 3B). We also added that the X-axis of Figure 3C represents standardized effect sizes.

      24) I suggest adding units directly to Table 1, not in the legend. Was genetic or self-reported ethnicity used in this table? List age in years, not months?

      Thank you for your suggestion. We added the units of age and family history of psychiatric disorders directly inside Table 1. We used genetic ethnicity in Table 1, as we only used genetic ethnicity (but not self-reported ethnicity) throughout our study. This is noted on the last row of Table 1. We listed age in chronological months, which is how each child’s age at each point of data collection is coded in the ABCD Study.

      25) Please include exact p-values in Table 2.

      Thank you for your suggestion. We highly appreciate the reviewer’s comment on the importance of showing exact p-values in the analysis results. Unfortunately, we cannot estimate the standard errors based on normal-theory approximations to obtain the exact p-values of our IGSCA model results. This is described in detail in the original paper of the IGSCA method (Hwang et al., 2021): “Like GSCA and GSCAM, IGSCA is also a nonparametric or distribution-free approach in the sense that it estimates parameters without recourse to distributional assumptions such as multivariate normality of indicators. As a trade-off of no reliance on distributional assumptions, it cannot estimate the standard errors of parameter estimates based on asymptotic (normal-theory) approximations. Instead, it utilizes the bootstrap method (Efron, 1979, 1982) to obtain the standard errors or confidence intervals of parameter estimates nonparametrically.”

      Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7, 1–26. http://dx.doi.org/10.1214/aos/1176344552

      Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. Philadelphia, PA: SIAM. http://dx.doi.org/10.1137/1.9781611970319

      Hwang, H., Cho, G., Jung, K., Falk, C. F., Flake, J. K., Jin, M. J., & Lee, S. H. (2021). An approach to structural equation modeling with both factors and components: Integrated generalized structured component analysis. Psychological Methods, 26(3), 273-294. doi:10.1037/met0000336

      26) There are way too many indigestible tables presented in the supplementary materials, which are also not referenced in the main manuscript.

      We appreciate your insightful observation. As you rightly identified, we inadvertently failed to reference Table S2 in the main text. We have since corrected this omission in the Results section for the IGSCA (SEM) analysis [line 376]. The remainder of the supplementary tables (Table S1, S3~S7) have been appropriately cited in the main manuscript. We recognize that the quantity of tables provided in the supplementary materials is substantial. However, given the comprehensiveness and complexity of our analyses, these tables offer intricate results from each analysis. We deem these results, which include valuable findings from sensitivity analyses and confound testing, too significant to exclude from the supplementary materials. That said, we are open to, and would greatly welcome, any further suggestions to ensure clarity and ease of comprehension. Your guidance in this matter is highly valued.

      27) Figure S1 is unclear, possibly due to the journal formatting. Is this one figure presented on two pages? Clarify which PGS is listed in Figure S1 and in any case, please add both PGSs.

      Thank you for mentioning this point. Figure S1 presents two correlation matrices: the first one is the correlation matrix of component / factor variables in the IGSCA model and the second one is the that of observed variables used to construct the relevant component / factor variables in the IGSCA model. We noted each matrix as Figure S1-A and Figure S1-B. We also corrected the figure legend as “A. Correlation between all component / factor variables of the IGSCA model. B. Correlation between all observed variables used to construct the relevant component / factor variables in the IGSCA model.” Since Figure S1-A presents correlations between the components and latent factors, it lists a single PGS variable constructed from the CP PGS and EA PGS. On the other hand, Figure S1-B presents correlations between the observed variables. Thus, both CP PGS and EA PGS are listed in this correlation matrix.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the three reviewers for their positive comments and helpful suggestions. We have addressed the issues raised which have helped to improve the manuscript. Below, we address the specific points with detailed responses.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      1) Figure 2 - figure supplement 1. The figure states minimal medium while the legend states rich medium.

      We have corrected the legend as the experiment was done in minimal medium.

      2) Figure 3B - the statements in the text do not seem to match what is in the figure. "Cluster 1 (293 genes, 12 priority unstudied) is enriched for genes showing high expression variability across different conditions (71) and for genes induced during meiotic differentiation (72) and in response to TORC1 inhibitors (29). Cluster 2 (570 genes, 20 priority unstudied) is enriched for phenotypes related to cell mating and sporulation, e.g. 'incomplete cell-wall disassembly at cell fusion site' or 'abnormal shmoo morphology'". These terms (high expression variability, meiotic differentiation, TORC1 inhibitors, cell mating and sporulation/abnormal shmoo morphology" are not seen in the figure.

      As stated in the Results, we have carried out analyses with both Metascape and AnGeLi for functional enrichments in different GO and KEGG pathway terms (Figure 3B; Metascape) and/or among genes from published expression or phenotyping studies (AnGeLi). The enrichments for expression variability, meiotic differentiation, TORC1 inhibitors, and cell mating/sporulation/abnormal shmoo morphology are not based on GO terms but on lists from published expression and phenotyping experiments. We have slightly edited the sentence in the Results to make this clearer.

      3) The authors could consider citing a systematic screen for sporulation in the introduction (PMID: 292590

      We have cited 17 papers for growth screens under different conditions using similar approaches as used by us. Given that we already cite 100 papers, we did not choose to cite numerous other papers reporting screens for more complex phenotypes (cell morphology, mating, meiosis, recombination, etc), which are not directly relevant to our study here.

      Reference PMID: 292590 refers to a 1979 paper in the German Dentist Journal.

      Reviewer #2 (Recommendations For The Authors):

      General comments

      1) The authors use their NET-FF approach to predict GO Biological Process and Molecular Function terms (Figure 4). Why was the Cellular Component ontology not included? In general, gene and protein functional characterization is best described by the Biological Process and Cellular Component ontologies, whereas Molecular Function describes the biochemical activity of a protein. In other words, proteins which share Biological Process and/or Cellular Component annotations often function in the same module, which may not be the case for shared Molecular Function annotations.

      We did not include Cellular Component because in previous benchmarking of our method using CAFA datasets our approach did not perform well at predicting Cellular Component. This aspect is harder to pick up from homology data and protein network data and is generally the toughest challenge in CAFA. In contrast, our predictions of Biological Process and Molecular Function are competitive with other methods. We have now made the reason for omitting Cellular Component clearer in the Methods.

      2) The authors use protein embeddings produced by integrating 6 STRING networks using the deepNF method. One of these networks is the "database" network. According to STRING (https://academic.oup.com/nar/article/47/D1/D607/5198476): "The database channel is based on manually curated interaction records assembled by expert curators, at KEGG, Reactome, BioCyc and Gene Ontology, as well as legacy datasets from PID and BioCarta". If one of the input networks contains information from GO, and then embeddings containing this information are used to predict GO annotations, are the authors not then leaking annotations which could improve downstream GO annotation predictions? It would be valuable to demonstrate to what extent the "database" network is contributing by repeating the GO prediction analyses with this network removed.

      We agree and also pointed out this circularity in the manuscript. We used an independent dataset – phenotype data – to benchmark our method, which showed good performance. Note that this study did not aim to develop a completely new method or improve on deepNF and CATH-FunFams but to integrate and exploit their combined power. For that reason, we wanted to keep as many high-quality curated edges in the STRING network as possible. Combining these independent methods brings synergies from their complementary approaches to facilitate interpretation of gene function.

      Minor comments

      1) Ternary encoding was used as a preprocessing step on the phenotype data before clustering was performed. An explanation of why this encoding was necessary (as opposed to a normalization/standardization approach) would be helpful.

      Ternary encoding was not strictly necessary but provided more nuanced and coherent clusters. Some conditions and mutants were associated with much larger phenotypic responses which disproportionately influenced the clustering. After trying different approaches, we followed the recommendations from the R package microbialPhenotypes (https://github.com/peterwu19881230/microbialPhenotypes), which is now specified in the legend of Fig. 3A. Discretizing the data also helped to compare phenotypes across different types of mutants, and we have applied this approach previously in our phenomics study of non-coding RNA mutants (Rodriguez-Lopez et al. eLife 2022). Moreover, this approach allowed us to generate vectors of phenotypes for calculating phenotypic distances between mutants (including hamming distance or Pearson correlations), which supported the posterior cluster analysis using Cytoscape.

      2) The authors use a validation set to perform early-stopping on the deepNF model. However, it appears that the validation set proteins are then used in downstream analyses anyway: "After training, weights from the epoch with the lowest validation loss were used to generate embeddings for all proteins" (my emphasis). In the case where the model was being used to generalize to new proteins (such as classification), this analysis would not be a valid way to perform hyperparameter tuning (e.g. early-stopping) since the validation set is then used in downstream analyses. However, deepNF is performing an unsupervised, multi- network encoding on all the available datapoints (proteins). In the case where only deepNF loss is being used to tune the hyperparameters, it's not necessary to use a held-out validation set - it is appropriate to use the full set of proteins to do this.

      Our Random Forest consisted of 500 trees with default values for the number of sub- features as √n and partial sampling of 0.7. GO terms were predicted using 5-fold cross- validation. Changing parameters showed that our model was robust to the values of the hyperparameters, so we settled on our initial model.

      3) The NET-FF hyperparameter tuning results should be made available in the supplement.

      We do not think this would be useful for the reason described in the reply above.

      Reviewer #3 (Recommendations For The Authors):

      Major points

      1) Why were the quantitive colony size data converted to -1, 0, and 1?

      It is unclear to me why the authors decided to convert the colony size data to ternary encoding of -1, 0, and 1. The original colony size data seem to be of fairly high precision so that the authors can detect a 5% difference from the wild type. I guess the authors must have tried using the quantitive colony size data for clustering analysis and found the results unsatisfactory. If that is the case, can the authors provide some possible explanations?

      A similar query has been raised by Reviewer 2. Ternary encoding provided more nuanced and coherent clusters. Some conditions and mutants were associated with much larger phenotypic responses which disproportionately influenced the clustering. After trying different approaches, we followed the recommendations from the R package microbialPhenotypes, as now specified in the legend of Fig. 3A. Discretizing the data also helped to compare phenotypes across different types of mutants, and we have applied this approach previously in our phenomics study of non-coding RNA mutants (Rodriguez-Lopez et al. eLife 2022). Moreover, this approach allowed us to generate vectors of phenotypes for calculating phenotypic distances between mutants (including hamming distance or Pearson correlations), which supported the posterior cluster analysis using Cytoscape.

      2) What do 5% difference and 10% difference look like?

      The authors used 5% difference and 10% difference as cutoffs. I am curious whether a 5% difference in colony size is obvious to human eyes. Can the authors show some plate images and label colonies that differ from the wild type by about 5% and 10%? It will help readers understand the thresholds used for determining whether a mutant has a phenotype.

      Showing the original ‘raw’ colonies would not be meaningful because all colony sizes have been grid-corrected as described (Kamrad et al. eLife 2020). The grid correction takes care of three issues: (1) it converts colony size into an easily interpretable value by reporting a ratio relative to wild type; (2) it makes results comparable across different plates/batches; and (3) it corrects for within-plate positional effects which become apparent due to the same wild-type grid strain showing different fitness in different plate positions. But in principle, detecting a 5% difference in colony size by eye would be hard, and multiple measurements are required (>10 repeats) to obtain statistically reliable results. Author response image 1 shows the grid colonies in red frames and numbers at bottom right of colonies indicate the corrected effect sizes. Colony 17-8 (top right) is an example of a colony differing by 5% compared to neighbouring colonies 16-8 and 17-9.

      Author response image 1.

      3) How were the phenotyping conditions chosen?

      I am sure that the authors have put a lot of thoughts into designing the 131 phenotyping conditions. It will benefit the readers if the authors can explain how these conditions were chosen. For example, what literature precedents were considered and which conditions have never been examined before in S. pombe research? For drug treatment conditions, were pilot tests done to choose drug doses based on the growth inhibition effects on the wild type?

      We have used a wide range of different types of conditions that affect diverse processes (see colour legend on top of Fig. 3A). This was based on our previous experience and selection of conditions in large-scale phenotyping of wild strains (Jeffares et al. Nature Genetics 2015) and non-coding RNA mutants (Rodriguez-Lopez et al. eLife 2022). For previously applied conditions (e.g. oxidants), we used literature precedents for the doses, while for other conditions, we used trial and error to adjust the diose such that wild-type cell growth is barely inhibited. For some drugs and stresses, we assayed both low and high doses, in which wild-type cell growth is normal or inhibited, respectively, to uncover both sensitive or resistant mutants.

      Minor points

      1) One of the growth condition is "YES_ethanol_1percent_no_glucose". I am curious how this is possible, as S. pombe cannot use ethanol as a carbon source.

      We assume that the cells contain sufficient internal glucose to fuel growth and division for a few cycles before running out of glucose. Thus, cells showed some residual growth on this medium, but growth is indeed very limited. Nevertheless, we could identify both sensitive and resistant mutants in this condition.

      2) Abstract "over 900 new proteins affected the resistance to oxidative stress". This sentence should be rephrased. Perhaps it is better to say "over 900 proteins were newly implicated in the resistance to oxidative stress".

      Yes, we have edited the sentence as suggested.

      3) Page 4 "S. pombe encodes 641 'unknown' genes (PomBase, status March 2023). " "Among these 643 unknown proteins, many are apparently found only in the fission yeast clade, but 380 are more widely conserved. " Which number is correct, 641 or 643?

      These numbers keep changing slightly. We now consistently use 641, the number from March 2023.

      4) Page 4 "These priority unstudied proteins have not been directly studied in any organism but can be assumed to have pertinent biological roles conserved over 500 million years of evolution. " According to http://timetree.org/, S. pombe and H. sapiens diverged about 1275 million years ago.

      We have now changed ‘over 500 million’ to ‘over 1000 million’, although there are of course different estimates for these times.

      5) "Using these potent wet and dry methods, we obtained 103,520 quantitative phenotype datapoints for 3,492 non-essential genes across 131 diverse conditions."

      I think "quantitative phenotype datapoints" are generated using wet methods, not dry methods. Yes, we have now deleted ‘Using these potent wet and dry methods,’ and start the sentence with ‘We obtained…’

      6) Abstract "We assayed colony-growth phenotypes to measure the fitness of deletion mutants for all 3509 non-essential genes"

      Page 6 "We performed colony-based phenotyping of the deletion mutants for all non- essential S. pombe genes"

      It is not clear to me how the authors can claim that the 3509 non-essential genes correspond to "all non-essential S. pombe genes". The authors should explain how they classify S. pombe genes into essential genes and non-essential genes. The deletion project papers (Kim et al. 2010 and Hayles et al. 2013) provided binary classification for most but not all genes, as there are genes whose deletion mutants were not generated by the deletion project. PomBase does not use a binary classification and there are a number of genes deemed "Gene Deletion Viability: Depends on conditions" by PomBase.

      We used the latest deletion library (Bioneer Version 5) as well as additional deletion mutants published by Kathy Gould and colleagues, which together should capture all non- essential genes. But we agree that non-essentiality is not that clear-cut and context- dependent. So we have deleted ‘all’ in the two sentences highlighted above.

      7) Page 20 "Other clusters contained mostly genes involved in vacuolar/endosomal transport and peroxisome function, along with poorly characterized genes (Figure 6B)."

      This sentence needs rephrasing. Perhaps it is better to say "Cluster 31 and cluster 22 contained respectively mostly genes involved in vacuolar/endosomal transport and peroxisome function, along with poorly characterized genes (Figure 6B)."

      We have edited this sentence to ‘Cluster 31 and Cluster 22 contained mostly genes involved in vacuolar/endosomal transport and peroxisome function, respectively, along with poorly characterized genes (Figure 6B).’

      8) Legend of Figure 2-figure supplement 1A

      "Left: Volcano plot of mutant colony sizes for priority unstudied genes (green) and all other genes (grey) growing in rich medium. " I think "rich medium" should be "minimal medium".

      Yes, we have now corrected this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study expands on current knowledge of allosteric diversity in the human kinome by C-terminal splicing variants using as a paradigm DCLK1. The authors provide solid evolutionary and some mechanistic evidence how C-terminal isoform specific variants generated by alternative splicing can regulate catalytic activity by means of coupling specific phosphorylation sites to dynamical and conformational changes controlling active site and substrate pocket occupancy, as well as protein-protein interactions. The data will be of interest to researchers in the kinase and signal transduction field.

      We thank the editor for coordinating the review of our manuscript and the reviewers for their valuable feedback. We have significantly revised the manuscript in response to the reviewer’s comments. Our point-by-point response to each comment is present below. We have uploaded both a clean draft of our revised manuscript as well as a version with the revisions highlighted in yellow. We hope the revised manuscript is now acceptable for publication in eLife. We have additionally updated the preprint on bioRxiv and have included the link: We thank the editor for coordinating the review of our manuscript and the reviewers for their valuable feedback. We have significantly revised the manuscript in response to the reviewer’s comments. Our point-by-point response to each comment is present below. We have uploaded both a clean draft of our revised manuscript as well as a version with the revisions highlighted in yellow. We hope the revised manuscript is now acceptable for publication in eLife. We have additionally updated the preprint on biorxiv and have included the link here: https://www.biorxiv.org/content/10.1101/2023.03.29.534689v2.

      Reviewer #1

      Summary

      In the study by Venkat et al. the authors expand the current knowledge of allosteric diversity in the human kinome by c-terminal splicing variants using as a paradigm DCLK1. In this work, the authors provide evolutionary and some mechanistic evidence about how c-terminal isoform specific variants generated by alternative splicing can regulate catalytic activity by means of coupling specific phosphorylation sites to dynamical and conformational changes controlling active site and substrate pocket occupancy, as well as interfering with protein-protein interacting interfaces that altogether provides evidence of c-terminal isoform specific regulation of the catalytic activity in protein kinases.

      The paper is overall well written, the rationale and the fundamental questions are clear and well explained, the evolutionary and MD analyses are very detailed and well explained. The methodology applied in terms of the biochemical and biophysical tools falls a bit short in some places and some comments and suggestions are given in this respect. If the authors could monitor somehow protein auto-phosphorylation as a functional readout would be very useful by means of using phospho-specific antibodies to monitor activity. Overall I think this is a study that brings some new aspects and concepts that are important for the protein kinase field, in particular the allosteric regulation of the catalytic core by c-terminal segments, and how evolutionary cues generate more sophisticated mechanisms of allosteric control in protein kinases. However a revision would be recommended.

      Major Comments

      The authors explain in the introduction the role of T688 autophosphorylation site in the function of DCLK1.2. This site when phosphorylated have a detrimental impact on catalytic activity and inhibits phosphorylation of the DCX domain. allowing the interaction with microtubules. In the paper they show how this site is generated by alternative splicing and intron skipping in DCLK1.2. However there is no further functional evidence along the functional experiments presented in this study.

      1) What is the effect of a non-phosphorylable T688 mutant in terms of stability and enzymatic activity? What would be the impact of this mutant in the overall auto-phosphorylation reaction?

      The role of T688 phosphorylation on DCLK1 functions has been explored in previous studies (Agulto et al, 2020: PMID: 34310279), although only relevant to DCLK1.2 splice variants, since this site is lacking in DCLK1.1. These studies showed that mutation of T688 to an alanine increases total kinase autophosphorylation (ie autoactivity) and the subsequent phosphorylation of DCX domains, which in turn decreases microtubule binding. Given this information, our goal was to use an evolutionary perspective to investigate this, alongside less-well characterized aspects of DCLK autoregulation, including co-conserved residues in the catalytic domain and C-terminal tail. However, to address the reviewers question of a non-phosphorylatable T688 mutant, we performed MD simulations of T688A and T688E (a phosphomimic) mutant and include a new supplementary figure (Figure 5-supplement 3) which show the two mutants slightly destabilize the C-tail relative to wt (1 and 2 angstrom increase in RMSF for T688E and T688A respectively), but by themselves cannot dislodge the C-tail from the ATP binding pocket. Thus, other co-conserved interactions as revealed by our analysis, are likely to contribute to the autoregulation of the kinase domain by the C-terminal tail. We have incorporated these observations into the revised results section.

      Furthermore, to address the reviewer’s question in terms of site-specific autophosphorylation as a marker of DCLK1.2 activity, we have now performed a much-more detailed phosphoproteomic analysis of a panel of purified DCLK1.2 proteins after purification from E.coli (Figure 8-figure supplement 2). This showed that we are only able to detect Thr 688 phosphorylated in our ‘activated’ DCLK1.2 mutants, and not in the autoinhibited WT DCLK1.2 version of the protein. This apparent contradiction does not necessary discount Thr 688 as an important regulatory hotspot, but, together with the MD simulations, may imply a decreased contribution of pThr 688 in facilitating/maintaining DCLK1.2 auto-inhibition than previously anticipated, especially in the context of the numerous other stabilizing amino acid contacts that we describe between the C-tail and the ATP-binding pocket. We do, however, propose a mechanism for pThr688 as a potential ATP mimic based on MD analysis. However, we only found MS-based evidence for phosphorylation at this (and other sites in the same peptide) in highly active DCLK1.2 mutants, in which the C-tail remains uncoupled from the ATP-binding site, even in the presence of this regulatory PTM. We acknowledge that better understanding of DCLK biology will require a detailed appraisal of how the DCLK auto-inhibited states are subsequently physiologically regulated (PTMs, protein-protein interaction etc.), but this is beyond the scope of our current evolutionary investigation, and the absence of phosphospecific antibodies makes this challenging currently. We intend to expand upon our current work by assessing the relative contribution of multiple DCLK phosphorylation sites (including, but not limited to, Thr 688) with regard to cellular DCLK auto-regulation in future studies, in part by generating such site-specific phospho-antibodies.

      2) Have the authors made an equivalent T687/688 tanden in DCLK1.1 instead of the two prolines?

      This is a good point. We have not considered introducing a T687/688 tandem mutation into DCLK1.1 (at the equivalent position to that of DCLK1.2), primarily because the amino acid composition of their respective C-tail domains are so highly divergent across the tail (due to alternative splicing, as discussed in our paper). As discussed in our present study, there are numerous contacts made between specific amino acids in the regulatory C-tail and the kinase domain of DCLK1.2, which functionally occlude ATP binding, and thus change catalytic output. It is these contacts, which are determined by the specific amino acid sequence identity, and not the extended length of the DCLK1.2 C-tail per se, that drives autoinhibition. The alternate amino acid sequence identity of the C-tail of DCLK1.1 does not enable such contacts to form, which we believe explains the different activities of the two isoforms.

      Furthermore, our mutational analysis reveals clearly that Thr688 and several other sites are more highly autophosphorylated in the artificially activated DCLK1.2 constructs than WT DCLK1.2, and as such it remains our hypothesis that introduction of the tandem phosphorylation sites into DCLK1.1 is unlikely to be sufficient to impose an auto-inhibitory conformation of the enzyme.

      3) Could T688 autophosphorylation be used as a functional readout to evaluate DCLK1.2 activity?

      We agree with the reviewer’s suggestion about using autophosphorylation (including potentially Thr688 for DCLK1.2) as a functional read out for DCLK1 activity. In our present study, we identify phosphorylated peptides containing pThr688 only in the mutationally activated DCLK1.2 variants. We have now taken this analytical approach further and performed a detailed comparative phosphoproteomic characterisation of all of our DCLK1 constructs, where we observe marked differences in the overall phosphorylation profiles of the mutant DCLK1.2 (and DCLK1.1) proteins relative to the less phosphorylated WT DCLK1.2 kinase. This manifests as a depletion in the total number of confidently assigned phosphorylation sites within the kinase domain and C-tail of WT DCLK1.2, and also as a depletion in the abundance of phosphorylated peptides for a given site. To help visualise this, individual phosphorylation sites have been schematically mapped onto DCLK1, which has been included as a new extended supplementary figure (Figure 8-figure supplement 2). For comparative analysis of phosphosite abundance, we could only select peptides that could be directly compared between all mutants (identical amino acid sequences) and those found to be phosphorylated in all proteins (these are Ser660 and Thr438); these are now shown in figure supplement 2 as a table. These site occupancies follow what we see with respect to the increased catalytic activity between DCLK1.1 and DCLK1.2 mutants versus DCLK1.2. We also detect increased phosphorylation of DCLK1.1 and activated DCLK1.2 mutants in comparison to (autoinhibited) DCLK1.2, supporting the hypothesis that these mutants are relieving the autoinhibited conformation.

      4) What are the evidences of the here described c-terminal specific interactions to be intra-molecular rather than inter-molecular? Have the authors looked at the monodispersion and molecular mass in solution of the different protein evaluated in this study? Basically, are the proteins in solutions monomers or dimers/oligomers?

      Analysis of symmetry mates in the crystal structure of DCLK1.2 (PDB ID: 6KYQ) provide no evidence for inter-molecular interactions. Furthermore, to evaluate oligomerization status in solution, we conducted an analytical size exclusion chromatography (SEC) and our analysis reveals that both DCLK1.1 and DCLK1.2 predominantly exist as monomers in solution (Figure 3-Supplements 1-3). These results suggest that the C-terminal tail interactions are primarily intra-molecular.

      5) (Figure 3) Did the authors look at the mono-dispersion of the protein preparation? The sec profile did result in one single peak or multiple peaks? Could the authors show the chromatogram? how many species do you have in solution? Was the tag removed from the recombinant proteins or not?

      Yes, as mentioned above, the SEC profile resulted in a single peak for both DCLK1.1 and DCLK1.2, which was confirmed as DCLK1 by subsequent SDS-PAGE. We have included the chromatogram and gels in supporting data (Figure 3-supplements 1-3) in the revised manuscript and updated the Methods section. ‘The short N-terminal 6-His affinity tag present on all other DCLK1 proteins described in this paper was left in situ on recombinant proteins, since it does not appear to interfere with DSF, biochemical interactions or catalysis.’

      6) Authors should do Michaelis-Menten saturation kinetics as shown in Figure 3C with the WT when comparing all the functional variant analysed in the study. So we can compared the catalytic rates and enzymatic constants (depicted in a table also) kcat, Km and catalytic efficiency constants (kcat/Km)

      Thank you for your suggestion. We have performed the requested comparative kinetics analyses for selected functional DCLK1 variants at the same concentration as suggested, using our real-time assay to determine Vmax for peptide phosphorylation as a function of ATP, but at a fixed substrate concentration (we are unable to assess Vmax above 5 µM peptide for technical reasons). The results of these analyses have been included in the revised version of Figure 8-Supplement 1, where they support differences in both Vmax and Km[ATP]; the ratio of these values very clearly points to differences in activities falling into ‘low’ or ‘high’. This kinetic analysis fully supports our initial activity assays, where mutations predicted to uncouple the auto-inhibitory C-tail rescue DCLK1.2 activity to levels similar to DCLK1.1 towards a common substrate.

      Minor Comments

      It is very interesting how the IBS together with the pT688 mimics ATP in the case of DCLK1.2 to reach full occupancy of the active site. On Figure 8 you evaluate residues of the GRL and IBS interface to probe such interactions.

      1) Did the authors look at the T688 non-phosphorylable mutant?

      See our response to Major Comment 1 above. In addition, due to the absence of T688 in DCLK1.1, we did not look at the T688A mutant of DCLK1.2 biochemically, partially because it has been characterized in previous studies, but partially because this site is preceeded by another Thr residue. The lack of a selective antibody towards this site makes it difficult to evaluate the role of T688 phosphorylation specifically with respect to DCLK cellular functions and interactions. Therefore, we focused our in vitro efforts to understand how mutations in the IBS impact the catalytic activity of DCLK1.2 by comparing different variants to DCLK1.1.

      2) Classification of DCLK C-terminal regulatory elements.

      It would be useful to connect the different regulatory elements described in this study to a specific functional and biological setting where these different switches play a role e.g. microtubule interactions and dynamics, cell cycle, cancer, etc..

      While the primary focus of our paper is on the mechanism of allosteric regulation of DCLK1, we have indeed touched upon the potential implications of the various regulatory elements of the tail on functions such as microtubule binding and phenotypic effects like cancer progression. However, we acknowledge that a comprehensive understanding of these effects would necessitate a more detailed investigation. This could potentially involve the integration of RNA-seq data with extensive cell assays to evaluate phenotypic effects. We believe that such a future study would be a valuable extension of our current work and could provide further insights into the functional roles of DCLK1.

      3) (Figure 3) Could the authors explain the differences in yield between the WT and the D531A mutant. Apparently, it [the yield] does not appear to be caused by a lower stability as indicated by the Tm. Could the authors comment on this? It is important to compare different samples in parallel, in the same experiment and side by side. This applies to the thermal shift data comparing WT and a D531A mutant on panel D and also on panel C a comparison between WT and D531A as negative control should be shown.

      WT and D533A (kinase-dead) were indeed analysed in parallel, but have been split in two panels to make the data easier to interpret. The modest differences in yield is likely explained by experimental prep-to-prep variations. Our experience shows that many protein kinase yields vary between kinase and kinase-dead variants, likely due to bacterial toxicity related to enzyme activity. In regards to thermal stability, we would like to emphasize that Differential Scanning Fluorimetry (DSF) is to our mind a more informative and quantitative measure of protein stability than yield from bacteria, because both assess purified proteins at the same concentration. We believe that the DSF data provide a more accurate representation of the real stability differences between the WT and D533A mutant.  

  3. Aug 2023
    1. Author Response

      The following is the authors’ response to the original reviews.

      We gratefully thank the editors and all reviewers for their time spend making their constructive remarks and useful suggestions, which has significantly raised the quality of the manuscript and has enable us to improve the manuscript. Each suggested comment brought forward by the reviewers was accurately considered. The manuscript has been revised in consideration of all suggestions.

      Reviewer #1 (Public Review):

      Wang and all present an interesting body of work focused on the effects of high altitude and hypoxia on erythropoiesis, resulting in erythrocytosis. This work is specifically focused on the spleen, identifying splenic macrophages as central cells in this effect. This is logical since these cells are involved in erythrophagocytosis and iron recycling. The results suggest that hypoxia induces splenomegaly with decreased number of splenic macrophages. There is also evidence that ferroptosis is induced in these macrophages, leading to cell destruction. Finally, the data suggest that ferroptosis in splenic red pulp macrophages causes the decrease in RBC clearance, resulting in erythrocytosis aka lengthening the RBC lifespan. However, there are many issues with the presented results, with somewhat superficial data, meaning the conclusions are overstated and there is decreased confidence that the hypotheses and observed results are directly causally related to hypoxia.

      Major points:

      1) The spleen is a relatively poorly understood organ but what is known about its role in erythropoiesis especially in mice is that it functions both to clear as well as to generate RBCs. The later process is termed extramedullary hematopoiesis and can occur in other bones beyond the pelvis, liver, and spleen. In mice, the spleen is the main organ of extramedullary erythropoiesis. The finding of transiently decreased spleen size prior to splenomegaly under hypoxic conditions is interesting but not well developed in the manuscript. This is a shortcoming as this is an opportunity to evaluate the immediate effect of hypoxia separately from its more chronic effect. Based just on spleen size, no conclusions can be drawn about what happens in the spleen in response to hypoxia.

      Thank you for your insightful comments and questions. The spleen is instrumental in both immune response and the clearance of erythrocytes, as well as serving as a significant reservoir of blood in the body. This organ, characterized by its high perfusion rate and pliability, constricts under conditions of intense stress, such as during peak physical exertion, the diving reflex, or protracted periods of apnea. This contraction can trigger an immediate release of red blood cells (RBCs) into the bloodstream in instances of substantial blood loss or significant reduction of RBCs. Moreover, elevated oxygen consumption rates in certain animal species can be partially attributed to splenic contractions, which augment hematocrit levels and the overall volume of circulating blood, thereby enhancing venous return and oxygen delivery (Dane et al. J Appl Physiol, 2006, 101:289-97; Longhurst et al. Am J Physiol, 1986, 251: H502-9). In our investigation, we noted a significant contraction of the spleen following exposure to hypoxia for a period of one day. We hypothesized that the body, under such conditions, is incapable of generating sufficient RBCs promptly enough to facilitate enhanced oxygen delivery. Consequently, the spleen reacts by releasing its stored RBCs through splenic constriction, leading to a measurable reduction in spleen size.

      However, we agree with you that further investigation is required to fully understand the implications of these changes. Considering the comments, we extended our research by incorporating more detailed examinations of spleen morphology and function during hypoxia, including the potential impact on extramedullary hematopoiesis. We anticipate that such an expanded analysis would not only help elucidate the initial response to hypoxia but also provide insights into the more chronic effects of this condition on spleen function and erythropoiesis.

      2) Monocyte repopulation of tissue resident macrophages is a minor component of the process being described and it is surprising that monocytes in the bone marrow and spleen are also decreased. Can the authors conjecture why this is happening? Typically, the expectation would be that a decrease in tissue resident macrophages would be accompanied by an increase in monocyte migration into the organ in a compensatory manner.

      We appreciate your insightful query regarding the observed decrease in monocytes in the bone marrow and spleen, particularly considering the typical compensatory increase in monocyte migration into organs following a decrease in tissue resident macrophages.

      The observed decrease in monocytes within the bone marrow is likely attributable to the fact that monocytes and precursor cells for red blood cells (RBCs) both originate from the same hematopoietic stem cells within the bone marrow. It is well established that exposure to hypobaric hypoxia (HH) induces erythroid differentiation specifically within the bone marrow, originating from these hematopoietic stem cells (Exp Hematol, 2021 May;97:32-46). As such, the differentiation to monocyte is reduced under hypoxic conditions, which may subsequently cause a decrease in migration to spleen.

      Furthermore, we hypothesize that an increased migration of monocytes to other tissues under HH exposure may also contribute to the decreased migration to the spleen. The liver, which partially contributes to the clearance of RBCs, may play a role in this process. Our investigations to date have indeed identified an increased monocyte migration to the liver. We were pleased to discover an elevation in CSF1 expression in the liver following HH exposure for both 7 and 14 days. This finding was corroborated through flow cytometry, which confirmed an increase in monocyte migration to the liver.

      Consequently, we propose that under HH conditions, the liver requires an increased influx of monocytes, which in turn leads to a decrease in monocyte migration to the spleen. However, it is important to note that these findings will be discussed more comprehensively in our forthcoming publication, and as such, the data pertaining to these results have not been included in the current manuscript.

      Author response image 1.

      3) Figure 3 does not definitively provide evidence that cell death is specifically occurring in splenic macrophages and the fraction of Cd11b+ cells is not changed in NN vs HH. Furthermore, the IHC of F4/80 in Fig 3U is not definitive as cells can express F4/80 more or less brightly and no negative/positive controls are shown for this panel.

      We appreciate your insightful comments and critiques regarding Figure 3. We acknowledge that the figure, as presented, does not definitively demonstrate that cell death is specifically occurring in splenic macrophages. While it is challenging to definitively determine the occurrence of cell death in macrophages based solely on Figure 3D-F, our single-cell analysis provides strong evidence that such an event occurs. We initially observed cell death within the spleen under hypobaric hypoxia (HH) conditions, and to discern the precise cell type involved, we conducted single-cell analyses. Regrettably, we did not articulate this clearly in our preliminary manuscript.

      In the revised version, we have modified the sequence of Figure 3A-C and Figure 3D-F for better clarity. Besides, we observed a significant decrease in the fraction of F4/80hiCD11bhi macrophages under HH conditions compared to NN. To make the changes more evident in CD86 and CD206, we have transformed these scatter plots into histograms in our revised manuscript.

      Author response image 2.

      Considering the limitations of F4/80 as a conclusive macrophage identifier, we have concurrently presented the immunohistochemical (IHC) analyses of heme oxygenase-1 (HO-1). Functioning as a macrophage marker, particularly in cells involved in iron metabolism, HO-1 offers additional diagnostic accuracy. Observations from both F4/80 and HO-1 staining suggested a primary localization of positively stained cells within the splenic red pulp. Following exposure to hypoxia-hyperoxia (HH) conditions, a decrease was noted in the expression of both F4/80 and HO-1. This decrease implies that HH conditions contribute to a reduction in macrophage population and impede the iron metabolism process. In the revised version of our manuscript, we have enhanced the clarity of Figure 3U to illustrate the presence of positive staining, with an emphasis on HO-1 staining, which is predominantly observed in the red pulp.

      Author response image 3.

      4) The phagocytic function of splenic red pulp macrophages relative to infection cannot be used directly to understand erythrophagocytosis. The standard approach is to use opsonized RBCs in vitro. Furthermore, RBC survival is a standard method to assess erythrophagocytosis function. In this method, biotin is injected via tail vein directly and small blood samples are collected to measure the clearance of biotinilation by flow; kits are available to accomplish this. Because the method is standard, Fig 4D is not necessary and Fig 4E needs to be performed only in blood by sampling mice repeatedly and comparing the rate of biotin decline in HH with NN (not comparing 7 d with 14 d).

      We appreciate your insightful comments and suggestions. We concur that the phagocytic function of splenic red pulp macrophages in the context of infection may not be directly translatable to understanding erythrophagocytosis. Given our assessment that the use of cy5.5-labeled E.coli alone may not be sufficient to accurately evaluate the phagocytic function of macrophages, we extended our study to include the use of NHS-biotin-labeled RBCs to assess phagocytic capabilities. While the presence of biotin-labeled RBCs in the blood could provide an indication of RBC clearance, this measure does not exclusively reflect the spleen's role in the process, as it fails to account for the clearance activities of other organs.

      Consequently, we propose that the remaining biotin-labeled RBCs in the spleen may provide a more direct representation of the organ's function in RBC clearance and sequestration. Our observations of diminished erythrophagocytosis at both 7- and 14-days following exposure to HH guided our subsequent efforts to quantify biotin-labeled RBCs in both the circulatory system and spleen. These measurements were conducted during the 7 to 14-day span following the confirmation of impaired erythrophagocytosis. Comparative evaluation of RBC clearance rates under NN and HH conditions provided further evidence supporting our preliminary observations, with the data revealing a decrease in the RBC clearance rate in the context of HH conditions. In response to feedback from other reviewers, we have elected to exclude the phagocytic results and the diagram of the erythrocyte labeling assay. These amendments will be incorporated into the revised manuscript. The reviewers' constructive feedback has played a crucial role in refining the methodological precision and coherence of our investigation.

      5) It is unclear whether Tuftsin has a specific effect on phagocytosis of RBCs without other potential confounding effects. Furthermore, quantifying iron in red pulp splenic macrophages requires alternative readily available more quantitative methods (e.g. sorted red pulp macrophages non-heme iron concentration).

      We appreciate your comments and questions regarding the potential effect of Tuftsin on the phagocytosis of RBCs and the quantification of iron in red pulp splenic macrophages. Regarding the role of Tuftsin, we concur that the literature directly associating Tuftsin with erythrophagocytosis is scant. The work of Gino Roberto Corazza et al. does suggest a link between Tuftsin and general phagocytic capacity, but it does not specifically address erythrophagocytosis (Am J Gastroenterol, 1999;94:391-397). We agree that further investigations are required to elucidate the potential confounding effects and to ascertain whether Tuftsin has a specific impact on the phagocytosis of RBCs. Concerning the quantification of iron in red pulp splenic macrophages, we acknowledge your suggestion to employ readily available and more quantitative methods. We have incorporated additional Fe2+ staining in the spleen at two time points: 7 and 14 days subsequent to HH exposure (refer to the following Figure). The resultant data reveal an escalated deposition of Fe2+ within the red pulp, as evidenced in Figures 5 (panels L and M) and Figure S1 (panels L and M).

      Author response image 4.

      6) In Fig 5, PBMCs are not thought to represent splenic macrophages and although of some interest, does not contribute significantly to the conclusions regarding splenic macrophages at the heart of the current work. The data is also in the wrong direction, namely providing evidence that PBMCs are relatively iron poor which is not consistent with ferroptosis which would increase cellular iron.

      We appreciate your insightful critique regarding Figure 5 and the interpretation of our data on peripheral blood mononuclear cells (PBMCs) in relation to splenic macrophages. We understand that PBMCs do not directly represent splenic macrophages, and we agree that any conclusions drawn from PBMCs must be considered with caution when discussing the behavior of splenic macrophages.

      The primary rationale for incorporating PBMCs into our study was to investigate the potential correspondence between their gene expression changes and those observed in the spleen after HH exposure. This was posited as a working hypothesis for further exploration rather than a conclusive statement. The gene expression in PBMCs was congruous with changes in the spleen's gene expression, demonstrating an iron deficiency phenotype, ostensibly due to the mobilization of intracellular iron for hemoglobin synthesis. Thus, it is plausible that NCOA4 may facilitate iron mobilization through the degradation of ferritin to store iron.

      It remains ambiguous whether ferroptosis was initiated in the PBMCs during our study. Ferroptosis primarily occurs as a response to an increase in Fe2+ rather than an overall increase in intracellular iron. Our preliminary proposition was that relative changes in gene expression in PBMCs could potentially mirror corresponding changes in protein expression in the spleen, thereby potentially indicating alterations in iron processing capacity post-HH exposure. However, we fully acknowledge that this is a conjecture requiring further empirical substantiation or clinical validation.

      7) Tfr1 increase is typically correlated with cellular iron deficiency while ferroptosis consistent with iron loading. The direction of the changes in multiple elements relevant to iron trafficking is somewhat confusing and without additional evidence, there is little confidence that the authors have reached the correct conclusion. Furthermore, the results here are analyses of total spleen samples rather than specific cells in the spleen.

      We appreciate your astute comments and agree that the observed increase in transferrin receptor (TfR) expression, typically associated with cellular iron deficiency, appears contradictory to the expected iron-loading state associated with ferroptosis. We understand that this apparent contradiction might engender some uncertainty about our conclusions. In our investigation, we evaluated total spleen samples as opposed to distinct cell types within the spleen, a factor that could have contributed to the seemingly discordant findings. An integral element to bear in mind is the existence of immature RBCs in the spleen, particularly within the hematopoietic island where these immature RBCs cluster around nurse macrophages. These immature RBCs contain abundant TfR which was needed for iron uptake and hemoglobin synthesis. These cells, which prove challenging to eliminate via perfusion, might have played a role in the observed upregulation in TfR expression, especially in the aftermath of HH exposure. Our further research revealed that the expression of TfR in macrophages diminished following hypoxic conditions, thereby suggesting that the elevated TfR expression in tissue samples may predominantly originate from other cell types, especially immature RBCs (refer to Author response image 5).

      Author response image 5.

      Reviewer #2 (Public Review):

      The authors aimed at elucidating the development of high altitude polycythemia which affects mice and men staying in the hypoxic atmosphere at high altitude (hypobaric hypoxia; HH). HH causes increased erythropoietin production which stimulates the production of red blood cells. The authors hypothesize that increased production is only partially responsible for exaggerated red blood cell production, i.e. polycythemia, but that decreased erythrophagocytosis in the spleen contributes to high red blood cells counts.

      The main strength of the study is the use of a mouse model exposed to HH in a hypobaric chamber. However, not all of the reported results are convincing due to some smaller effects which one may doubt to result in the overall increase in red blood cells as claimed by the authors. Moreover, direct proof for reduced erythrophagocytosis is compromised due to a strong spontaneous loss of labelled red blood cells, although effects of labelled E. coli phagocytosis are shown. Their discussion addresses some of the unexpected results, such as the reduced expression of HO-1 under hypoxia but due to the above-mentioned limitations much of the discussion remains hypothetical.

      Thank you for your valuable feedback and insight. We appreciate the recognition of the strength of our study model, the exposure of mice to hypobaric hypoxia (HH) in a hypobaric animal chamber. We also understand your concerns about the smaller effects and their potential impact on the overall increase in red blood cells (RBCs), as well as the apparent reduced erythrophagocytosis due to the loss of labelled RBCs.

      Erythropoiesis has been predominantly attributed to the amplified production of RBCs under conditions of HH. The focus of our research was to underscore the potential acceleration of hypoxia-associated polycythemia (HAPC) as a result of compromised erythrophagocytosis. Considering the spontaneous loss of labelled RBCs in vivo, we assessed the clearance rate of RBCs at the stages of 7 and 14 days within the HH environment, and subsequently compared this rate within the period from 7 to 14 days following the clear manifestation of erythrophagocytosis impairment at the two aforementioned points identified in our study. This approach was designed to negate the effects of spontaneous loss of labelled RBCs in both NN and HH conditions. Correspondingly, the results derived from blood and spleen analyses corroborated a decline in the RBC clearance rate under HH when juxtaposed with NN conditions.

      Apart from the E. coli phagocytosis and the labeled RBCs experiment (this part of the results was removed in the revision), the injection of Tuftsin further substantiated the impairment of erythrophagocytosis in the HH spleen, as evidenced by the observed decrease in iron within the red pulp of the spleen post-perfusion. Furthermore, to validate our findings, we incorporated RBCs staining in splenic cells at 7 and 14 days of HH exposure, which provided concrete confirmation of impaired erythrophagocytosis (new Figure 4E).

      Author response image 6.

      As for the reduced expression of heme oxygenase-1 (HO-1) under hypoxia, we agree that this was an unexpected result, and we are in the process of further exploring the underlying mechanisms. It is possible that there are other regulatory pathways at play that are yet to be identified. However, we believe that by offering possible interpretations of our data and potential directions for future research, we contribute to the ongoing scientific discourse in this area.

      Reviewer #3 (Public Review):

      The manuscript by Yang et al. investigated in mice how hypobaric hypoxia can modify the RBC clearance function of the spleen, a concept that is of interest. Via interpretation of their data, the authors proposed a model that hypoxia causes an increase in cellular iron levels, possibly in RPMs, leading to ferroptosis, and downregulates their erythrophagocytic capacity. However, most of the data is generated on total splenocytes/total spleen, and the conclusions are not always supported by the presented data. The model of the authors could be questioned by the paper by Youssef et al. (which the authors cite, but in an unclear context) that the ferroptosis in RPMs could be mediated by augmented erythrophagocytosis. As such, the loss of RPMs in vivo which is indeed clear in the histological section shown (and is a strong and interesting finding) can be not directly caused by hypoxia, but by enhanced RBC clearance. Such a possibility should be taken into account.

      Thank you for your insightful comments and constructive feedback. In their research, Youssef et al. (2018) discerned that elevated erythrophagocytosis of stressed red blood cells (RBCs) instigates ferroptosis in red pulp macrophages (RPMs) within the spleen, as evidenced in a mouse model of transfusion. This augmentation of erythrophagocytosis was conspicuous five hours post-injection of RBCs. Conversely, our study elucidated the decrease in erythrophagocytosis in the spleen after both 7 and 14 days.

      Typically, macrophages exhibit an enhanced phagocytic capacity in the immediate aftermath of stress or stimulation. Nonetheless, the temporal points of observation in our study were considerably extended (7 and 14 days). It is currently unclear whether the phagocytic capacity is amplified during the acute phase of HH exposure, especially on the first day. Considering that the spleen contraction on the next day of HH leads to the release of stored RBCs into the bloodstream, and whether this initial reaction leads to ferroptosis, and the phagocytic capacity of RBCs is subsequently weakened after 7 or 14 days under sustained HH conditions.

      Major points:

      1) The authors present data from total splenocytes and then relate the obtained data to RPMs, which are quantitatively a minor population in the spleen. Eg, labile iron is increased in the splenocytes upon HH, but the manuscript does not show that this occurs in the red pulp or RPMs. They also measure gene/protein expression changes in the total spleen and connect them to changes in macrophages, as indicated in the model Figure (Fig. 7). HO-1 and levels of Ferritin (L and H) can be attributed to the drop in RPMs in the spleen. Are any of these changes preserved cell-intrinsically in cultured macrophages? This should be shown to support the model (relates also to lines 487-88, where the authors again speculate that hypoxia decreases HO-1 which was not demonstrated). In the current stage, for example, we do not know if the labile iron increase in cultured cells and in the spleen in vivo upon hypoxia is the same phenomenon, and why labile iron is increased. To improve the manuscript, the authors should study specifically RPMs.

      We express our gratitude for your perceptive remarks. In our initial manuscript, we did not evaluate labile iron within the red pulp and red pulp macrophages (RPMs). To address this oversight, we utilized the Lillie staining method, in accordance with the protocol outlined by Liu et al., (Chemosphere, 2021, 264(Pt 1):128413), to discern Fe2+ presence within these regions. The outcomes were consistent with our antecedent Western blot and flow cytometry findings in the spleen, corroborating an increment in labile iron specifically within the red pulp of the spleen.

      Author response image 7.

      However, we acknowledge the necessity for other supplementary experimental efforts to further validate these findings. Additionally, we scrutinized the expression of heme oxygenase-1 (HO-1) and iron-related proteins, including transferrin receptor (TfR), ferroportin (Fpn), ferritin (Ft), and nuclear receptor coactivator 4 (NCOA4) in primary macrophages subjected to 1% hypoxic conditions, both with and without hemoglobin treatment. Our results indicated that the expression of ferroptosis-related proteins was consistent with in vivo studies, however the expression of iron related proteins was not similar in vitro and in vivo. It suggesting that the increase in labile iron in cultured cells and the spleen in vivo upon hypoxia are not identical phenomena. However, the precise mechanism remains elusive.

      In our study, we observed a decrease in HO-1 protein expression following 7 and 14 days of HH exposure, as shown in Figure 3U, 5A, and S1A. This finding contradicts previous research that identified HO-1 as a hypoxia-inducible factor (HIF) target under hypoxic conditions (P J Lee et al., 1997). Our discussion, therefore, addressed the potential discrepancy in HO-1 expression under HH. According to our findings, HO-1 regulation under HH appears to be predominantly influenced by macrophage numbers and the RBCs to be processed in the spleen or macrophages, rather than by hypoxia alone.

      It is challenging to discern whether the increased labile iron observed in vitro accurately reflects the in vivo phenomenon, as replicating the iron requirements for RBCs production induced by HH in vitro is inherently difficult. However, by integrating our in vivo and in vitro studies, we determined that the elevated Fe2+ levels were not dependent on HO-1 protein expression, as HO-1 levels was increased in vitro while decreasing in vivo under hypoxic/HH exposure.

      Author response image 8.

      2) The paper uses flow cytometry, but how this method was applied is suboptimal: there are no gating strategies, no indication if single events were determined, and how cell viability was assessed, which are the parent populations when % of cells is shown on the graphs. How RBCs in the spleen could be analyzed without dedicated cell surface markers? A drop in splenic RPMs is presented as the key finding of the manuscript but Fig. 3M shows gating (suboptimal) for monocytes, not RPMs. RPMs are typically F4/80-high, CD11-low (again no gating strategy is shown for RPMs). Also, the authors used single-cell RNAseq to detect a drop in splenic macrophages upon HH, but they do not indicate in Fig. A-C which cluster of cells relates to macrophages. Cell clusters are not identified in these panels, hence the data is not interpretable).

      Thank you for your comments and constructive critique regarding our flow cytometry methodology and presentation. We understand the need for greater transparency and detailed explanation of our procedures, and we acknowledge that the lack of gating strategies and other pertinent information in our initial manuscript may have affected the clarity of our findings.

      In our initial report, we provided an overview of the decline in migrated macrophages (F4/80hiCD11bhi), including both M1 and M2 expression in migrated macrophages, as illustrated in Figure 3, but did not specifically address the changes in red pulp macrophages (RPMs). Based on previous results, it is difficult to identify CD11b- and CD11blo cells. We will repeat the results and attempt to identify F4/80hiCD11blo cells in the revised manuscript. The results of the reanalysis are now included (Figure 3M). However, single-cell in vivo analysis studies may more accurately identify specific cell types that decrease after exposure to HH.

      Author response image 9.

      Furthermore, we substantiated the reduction in red pulp, as evidenced by Figure 4J, given that iron processing primarily occurs within the red pulp. In Figure 3, our initial objective was merely to illustrate the reduction in total macrophages in the spleen following HH exposure.

      To further clarify the characterization of various cell types, we conducted a single-cell analysis. Our findings indicated that clusters 0,1,3,4,14,18, and 29 represented B cells, clusters 2, 10, 12, and 28 represented T cells, clusters 15 and 22 corresponded to NK cells, clusters 5, 11, 13, and 19 represented NKT cells, clusters 6, 9, and 24 represented cell cycle cells, clusters 26 and 17 represented plasma cells, clusters 21 and 23 represented neutrophils, cluster 30 represented erythrocytes, and clusters 7, 8, 16, 20, 24, and 27 represented dendritic cells (DCs) and macrophages, as depicted in Figure 3E.

      3) The authors draw conclusions that are not supported by the data, some examples: a) they cannot exclude eg the compensatory involvement of the liver in the RBCs clearance (the differences between HH sham and HH splenectomy is mild in Fig. 2 E, F and G).

      Thank you for your insightful comments and for pointing out the potential involvement of other organs, such as the liver, in the RBC clearance under HH conditions. We concur with your observation that the differences between the HH sham and HH splenectomy conditions in Fig. 2 E, F, and G are modest. This could indeed suggest a compensatory role of other organs in RBC clearance when splenectomy is performed. Our intent, however, was to underscore the primary role of the spleen in this process under HH exposure.

      In fact, after our initial investigations, we conducted a more extensive study examining the role of the liver in RBC clearance under HH conditions. Our findings, as illustrated in the figures submitted with this response, indeed support a compensatory role for the liver. Specifically, we observed an increase in macrophage numbers and phagocytic activity in the liver under HH conditions. Although the differences in RBC count between the HH sham and HH splenectomy conditions may seem minor, it is essential to consider the unit of this measurement, which is value*1012/ml. Even a small numerical difference can represent a significant biological variation at this scale.

      Author response image 10.

      b) splenomegaly is typically caused by increased extramedullary erythropoiesis, not RBC retention. Why do the authors support the second possibility? Related to this, why do the authors conclude that data in Fig. 4 G,H support the model of RBC retention? A significant drop in splenic RBCs (poorly gated) was observed at 7 days, between NN and HH groups, which could actually indicate increased RBC clearance capacity = less retention.

      Prior investigations have predominantly suggested that spleen enlargement under hypoxic conditions stems from the spleen's extramedullary hematopoiesis. Nevertheless, an intriguing study conducted in 1994 by the General Hospital of Xizang Military Region reported substantial exaggeration and congestion of splenic sinuses in high altitude polycythemia (HAPC) patients. This finding was based on the dissection of spleens from 12 patients with HAPC (Zou Xunda, et al., Southwest Defense Medicine, 1994;5:294-296). Moreover, a recent study indicated that extramedullary erythropoiesis reaches its zenith between 3 to 7 days (Wang H et al., 2021).

      Considering these findings, the present study postulates that hypoxia-induced inhibition of erythrophagocytosis may lead to RBC retention. However, we acknowledge that the manuscript in its current preprint form does not offer conclusive evidence to substantiate this hypothesis. To bridge this gap, we further conducted experiments where the spleen was perfused, and total cells were collected post HH exposure. These cells were then smeared onto slides and subjected to Wright staining. Our results unequivocally demonstrate an evident increase in deformation and retention of RBCs in the spleen following 7 and 14 days of HH exposure. This finding strengthens our initial hypothesis and contributes a novel perspective to the understanding of splenic responses under hypoxic conditions.

      Author response image 11.

      c) lines 452-54: there is no data for decreased phagocytosis in vivo, especially in the context of erythrophagocytosis. This should be done with stressed RBCs transfusion assays, very good examples, like from Youssef et al. or Threul et al. are available in the literature.

      Thanks. In their seminal work, Youssef and colleagues demonstrated that the transfusion of stressed RBCs triggers erythrophagocytosis and subsequently incites ferroptosis in red pulp macrophages (RPMs) within a span of five hours. Given these observations, the applicability of this model to evaluate macrophage phagocytosis in the spleen or RPMs under HH conditions may be limited, as HH has already induced erythropoiesis in vivo. In addition, it was unclear whether the membrane characteristics of stress induced RBCs were similar to those of HH induced RBCs, as this is an important signal for in vivo phagocytosis. The ambiguity arises from the fact that we currently lack sufficient knowledge to discern whether the changes in phagocytosis are instigated by the presence of stressed RBCs or by changes of macrophages induced by HH in vivo. Nonetheless, we appreciate the potential value of this approach and intend to explore its utility in our future investigations. The prospect of distinguishing the effects of stressed RBCs from those of HH on macrophage phagocytosis is an intriguing line of inquiry that could yield significant insights into the mechanisms governing these physiological processes. We will investigate this issue in our further study.

      d) Line 475 - ferritinophagy was not shown in response to hypoxia by the manuscript, especially that NCOA4 is decreased, at least in the total spleen.

      Drawing on the research published in eLife in 2015, it was unequivocally established that ferritinophagy, facilitated by Nuclear Receptor Coactivator 4 (NCOA4), is indispensable for erythropoiesis. This process is modulated by iron-dependent HECT and RLD domain containing E3 ubiquitin protein ligase 2 (HERC2)-mediated proteolysis (Joseph D Mancias et al., eLife. 2015; 4: e10308). As is widely recognized, NCOA4 plays a critical role in directing ferritin (Ft) to the lysosome, where both NCOA4 and Ft undergo coordinated degradation. In our study, we provide evidence that exposure to HH stimulates erythropoiesis (Figure 1). We propose that this, in turn, could promote ferritinophagy via NCOA4, resulting in a decrease in NCOA4 protein levels post-HH exposure. We will further increase experiments to verify this concern. This finding not only aligns with the established understanding of ferritinophagy and erythropoiesis but also adds a novel dimension to the understanding of cellular responses to hypoxic conditions.

      4) In a few cases, the authors show only representative dot plots or histograms, without quantification for n>1. In Fig. 4B the authors write about a significant decrease (although with n=1 no statistics could be applied here; of note, it is not clear what kind of samples were analyzed here). Another example is Fig. 6I. In this case, it is even more important as the data are conflicting the cited article and the new one: PMCID: PMC9908853 which shows that hypoxia stimulates efferocytosis. Sometimes the manuscript claim that some changes are observed, although they are not visible in representative figures (eg for M1 and M2 macrophages in Fig. 3M)

      We recognize that our initial portrayal of Figure 4B was lacking in precision, given that it did not include the corresponding statistical graph. While our results demonstrated a significant reduction in the ability to phagocytose E. coli, in line with the recommendations of other reviewers, we have opted to remove the results pertaining to E. coli phagocytosis in this revision, as they primarily reflected immune function.

      In relation to PMC9908853, which reported metabolic adaptation facilitating enhanced macrophage efferocytosis in limited-oxygen environments, it is worth noting that the macrophages investigated in this study were derived from ER-Hoxb8 macrophage progenitors following the removal of β-estradiol. Consequently, questions arise regarding the comparability between these cultured macrophages and primary macrophages obtained fresh from the spleen post HH exposure. The characteristics and functions of these two different macrophage sources may not align precisely, and this distinction necessitates further investigation.

      5) There are several unclear issues in methodology:

      • what is the purity of primary RPMs in the culture? RPMs are quantitatively poorly represented in splenocyte single-cell suspensions. This reviewer is quite skeptical that the processing of splenocytes from approx 1 mm3 of tissue was sufficient to establish primary RPM cultures. The authors should prove that the cultured cells were indeed RPMs, not monocyte-derived macrophages or other splenic macrophage subtypes.

      Thank you for your thoughtful comments and inquiries. Firstly, I apologize if we did not make it clear in the original manuscript. The purity of the primary RPMs in our culture was found to be approximately 40%, as identified by F4/80hiCD11blo markers using flow cytometry. We recognize that RPMs are typically underrepresented in splenocyte single-cell suspensions, and the concern you raise about the potential for contamination by other cell types is valid.

      We apologize for any ambiguities in the methodological description that may have led to misunderstandings during the review. Indeed, the entirety of the spleen is typically employed for splenic macrophage culture. The size of the spleen can vary dependent on the species and age of the animal, but in mice, it is commonly approximately 1 cm in length. The spleen is then dissected into minuscule fragments, each approximately 1 mm3 in volume, to aid in enzymatic digestion. This procedure does not merely utilize a single 1 mm3 tissue fragment for RPMs cultures. Although the isolation and culture of spleen macrophages can present considerable challenges, our method has been optimized to enhance the yield of this specific cell population.

      • (around line 183) In the description of flow cytometry, there are several missing issues. In 1) it is unclear which type of samples were analyzed. In 2) it is not clear how splenocyte cell suspension was prepared.

      1) Whole blood was extracted from the mice and collected into an anticoagulant tube, which was then set aside for subsequent thiazole orange (TO) staining.

      2) Splenic tissue was procured from the mice and subsequently processed into a single-cell suspension using a 40 μm filter. The erythrocytes within the entire sample were subsequently lysed and eliminated, and the remaining cell suspension was resuspended in phosphate-buffered saline (PBS) in preparation for ensuing analyses.

      We have meticulously revised these methodological details in the corresponding section of the manuscript to ensure clarity and precision.

      • In line 192: what does it mean: 'This step can be omitted from cell samples'?

      The methodology employed for the quantification of intracellular divalent iron content and lipid peroxidation level was executed as follows: Splenic tissue was first processed into a single cell suspension, subsequently followed by the lysis of RBCs. It should be noted that this particular stage is superfluous when dealing with isolated cell samples. Subsequently, a total of 1 × 106 cells were incubated with 100 μL of BioTracker Far-red Labile Fe2+ Dye (1 mM, Sigma, SCT037, USA) for a duration of 1 hour, or alternatively, C11-Bodipy 581/591 (10 μM, Thermo Fisher, D3861, USA) for a span of 30 minutes. Post incubation, cells were thoroughly washed twice with PBS. Flow cytometric analysis was subsequently performed, utilizing the FL6 (638 nm/660 nm) channel for the determination of intracellular divalent iron content, and the FL1 (488 nm/525 nm) channel for the quantification of the lipid peroxidation level.

      • 'TO method' is not commonly used anymore and hence it was unclear to this Reviewer. Reticulocytes should be analyzed with proper gating, using cell surface markers.

      We are appreciative of your astute observation pertaining to the methodology we employed to analyze reticulocytes in our study. We value your recommendation to utilize cell surface markers for effective gating, which indeed represents a more modern and accurate approach. However, as reticulocyte identification is not the central focus of our investigation, we opted for the TO staining method—due to its simplicity and credibility of results. In our initial exploration, we adopted the TO staining method in accordance with the protocol outlined (Sci Rep, 2018, 8(1):12793), primarily owing to its established use and demonstrated efficacy in reticulocyte identification.

      • The description of 'phagocytosis of E. coli and RBCs' in the Methods section is unclear and incomplete. The Results section suggests that for the biotinylated RBCs, phagocytosis? or retention? Of RBCs was quantified in vivo, upon transfusion. However, the Methods section suggests either in vitro/ex vivo approach. It is vague what was indeed performed and how in detail. If RBC transfusion was done, this should be properly described. Of note, biotinylation of RBCs is typically done in vivo only, being a first step in RBC lifespan assay. The such assay is missing in the manuscript. Also, it is not clear if the detection of biotinylated RBCs was performed in permeablized cells (this would be required).

      Thanks for the comments. In our initial methodology, we employed Cy5.5-labeled Escherichia coli to probe phagocytic function, albeit with the understanding that this may not constitute the most ideal model for phagocytosis detection within this context (in light of recommendations from other reviewers, we have removed the E. coli phagocytosis results from this revision, as they predominantly mirror immune function). Our fundamental aim was to ascertain whether HH compromises the erythrophagocytic potential of splenic macrophages. In pursuit of this, we subsequently analyzed the clearance of biotinylated RBCs in both the bloodstream and spleen to assess phagocytic functionality in vivo.

      In the present study, instead of transfusing biotinylated RBCs into mice, we opted to inject N-Hydroxysuccinimide (NHS)-biotin into the bloodstream. NHS-biotin is capable of binding with cell membranes in vivo and can be recognized by streptavidin-fluorescein isothiocyanate (FITC) after cells are extracted from the blood or spleen in vitro. Consequently, biotin-labeled RBCs were detectable in both the blood and spleen following NHS-biotin injection for a duration of 21 days. Ultimately, we employed flow cytometry to analyze the NHS-biotin labeled RBCs in the blood or spleen. This method facilitates the detection of live cells and is not applicable to permeabilized cells. We believe this approach better aligns with our investigative goals and offers a more robust evaluation of erythrophagocytic function under hypoxic conditions.

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake.

      Thank you for your comments and recommendations. We appreciate your understanding that the choice of implementing revisions ultimately rests with us. However, we also value your expertise and will seriously consider your suggestions as they can provide additional perspectives to our work and contribute to the overall quality and robustness of our study.

      We strive to produce research that meets the highest scientific standards and we believe that constructive criticism, such as yours, helps us to achieve this objective. We will carefully review your comments and consider the appropriate changes to make in order to address your concerns and improve our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Minor:

      1) HCV in text is a typo, should be HCT. Please edit.

      Thanks for the correction. We’ve revised it.

      1. Fig 2D is not useful beyond the more accurate measure of HCT in Fig 2G and should be removed.

      Thank you for your feedback and suggestion about Fig. 2D. We understand your point regarding the comparative accuracy of HCT in Fig. 2G. However, our intention in including Fig. 2D was to provide a more intuitive visual representation of the erythrocyte position levels, which we believe complements the more precise HCT data. We have observed that the erythrocyte positions significantly increased for 14 days after HH splenectomy, and this trend is visually depicted in Fig. 2D. While HCT provides a more accurate measure, Fig. 2D provides a snapshot that can be more immediately graspable, especially for readers who may prefer visual data. Nevertheless, we appreciate your perspective and will reassess whether the inclusion of Fig. 2D adds enough value to the overall understanding of our findings. If we find that it indeed does not contribute significantly, we will consider removing it in line with your suggestion.

      1. What is the purpose of performing splenectomy? It is well established that reticuloendothelial cells of the liver perform a redundant function to splenic macrophages and since these cells are not being evaluated, data following splenectomy is of limited value. Please remove or move to supplement. Alternatively, evaluate what happens in the liver in response to hypoxia. Is there an increase in erythroblasts? Is there a decrease in liver macrophages in the same way as in the spleen in non-splenectomized mice? The minimally increased HCT in hypoxic splenectomized mice (relative to non-splenectomized mice) suggests that the spleen does the primary work of clearance but not exclusively since there is still a major increase in response to hypoxia in splenectomized mice. The sentence (page 16, line 292) states that the spleen is essential which is not the case based on this data.

      Thank you for your comments and recommendations. In reality, we have been consistently studying the liver's response to hypobaric hypoxia (HH) exposure. Nevertheless, the changes observed in the liver are contrary to those in the spleen, including an increase in macrophage count and the capacity for erythrophagocytosis, as well as processing heme iron (refer to the above figure for details).

      It is widely accepted that HH exposure predominantly induces erythropoiesis by stimulating bone marrow production. The primary objective of this study was not to refute this central mechanism behind erythrocytosis. Instead, our intent was to supplement this understanding by proposing that impaired clearance of red blood cells (RBCs) could potentially exacerbate erythrocytosis. We believe this additional perspective could significantly enhance our understanding of the complex dynamics involved in RBC production and clearance under hypoxic conditions.

      Reviewer #2 (Recommendations For The Authors):

      The following questions and remarks should be considered by the authors:

      1). The methods should clearly state whether the HH was discontinued during the 7- or 14-day exposure for cleaning, fresh water etc. Moreover, how was CO2 controlled? The procedure for splenectomy needs to be described in the methods.

      Thank you for your insightful comments and questions. We apologize for any lack of clarity in our original description. To address your questions:

      During the 7- or 14-day HH exposure, the HH was not discontinued for cleaning or providing fresh water. We ensured that the cage was thoroughly cleaned, and food and water were sufficiently stocked before placing the mice into the HH chamber. The design of the cage and the HH chamber allowed the mice to have continuous access to food and water during the entire exposure period.

      Regarding the control of CO2, the HH chamber was equipped with a CO2 scrubbing system. The system utilized soda lime to absorb excess CO2 produced by the mice, and the air inside the chamber was exchanged with the air outside 25 times per hour to maintain a stable atmospheric concentration and ensure adequate oxygen supply.

      As for the procedure for splenectomy, we apologize for the omission in the original manuscript. The mice were anesthetized using isoflurane, and a small incision was made in the left flank to expose the spleen. The spleen was then gently exteriorized, ligated, and excised. The incision was sutured, and the mice were allowed to recover under close monitoring. We ensured that all procedures were performed in accordance with our institution's guidelines for animal care.

      2) The lack of changes in MCH needs explanation? During stress erythropoiesis some limit in iron availability should cause MCH decrease particularly if the authors claim that macrophages for rapid iron recycling are decreased. Fig 1A is dispensable. Fig 1G NN control 14 days does not make sense since it is higher than 7 days of HH.

      Thank you for your insightful comments and queries. Regarding the lack of changes in Mean Corpuscular Hemoglobin (MCH), our hypothesis is that the decrease in iron recycling in the spleen following HH is potentially compensated by the increased iron absorption or supply from the liver, thus maintaining the iron requirement for erythropoiesis. This may explain why MCH levels did not significantly change after HH exposure. We have indeed observed an increase in macrophage numbers and their erythrophagocytosis/heme iron processing ability after HH exposure for 7 or 14 days in liver (please refer to the above figure for details), suggesting a compensatory mechanism to ensure adequate iron for erythropoiesis.

      Regarding your comment on Fig 1A, we included this figure to provide a baseline of the experimental condition before any treatment. However, we understand your point and will consider removing it if it does not contribute significantly to the interpretation of our results. As for Fig 1G, we agree that the control at 14 days being higher than 7 days of HH may seem counterintuitive. We believe this could be due to individual variations among the mice or potential experimental errors. However, considering recommendations from other reviewers, we have removed this result from the revised manuscript.

      3) Fig 2, the difference between sham and splenectomy is really marginal and not convincing. Is there also a difference at 7 days? Why does the spleen size decrease between 7 and 14 days?

      We understand your concerns regarding the observed differences in Fig. 2 between sham and splenectomy groups. We acknowledge that while the absolute numerical differences may appear marginal, it is important to consider the unit of measurement. In the case of RBC count, the unit is 1012/L, hence even slight numerical differences can translate to significant variations in the actual count of RBCs.

      We did not examine alterations occurring 7 days post-splenectomy in our study. The discernible trend of spleen size diminution between the 7th and 14th days is indeed compelling. It is plausible that this might be attributable to the body's adaptive response to hypobaric hypoxia (HH) exposure, wherein spleen size initially enlarges (at day 7) in response to compensatory erythropoiesis, followed by a reduction (at day 14) as the body acclimatizes to the HH conditions. Nevertheless, we did not identify a statistically significant difference between the measurements at day 7 and day 14, suggesting that this observation warrants further scrutiny.

      4) Fig 3B, the clusters should be explained in detail. If the decrease in macrophages in Fig 3K/L is responsible for the effect, why does splenectomy not have a much stronger effect? How do the authors know which cells died in the calcein stained population in Fig 3D?

      Thank you for your insightful queries and comments. Regarding Fig. 3B, we apologize for not providing sufficient detail on the clusters in the original manuscript. We will ensure that we include a comprehensive explanation of the clusters, including the specific cell types and their respective markers, in our revision. (clusters 0,1,3,4,14,18, and 29 represented B cells, clusters 2, 10, 12, and 28 represented T cells, clusters 15 and 22 corresponded to NK cells, clusters 5, 11, 13, and 19 represented NKT cells, clusters 6, 9, and 24 represented cell cycle cells, clusters 26 and 17 represented plasma cells, clusters 21 and 23 represented neutrophils, cluster 30 represented erythrocytes, and clusters 7, 8, 16, 20, 24, and 27 represented dendritic cells (DCs) and macrophages).

      As for the decrease in macrophages observed in Fig. 3K/L, it's important to note that the spleen is a complex organ comprising numerous cell types, all of which can contribute to its overall function. While macrophages play a crucial role in iron recycling and erythropoiesis, other cell types and factors may also influence these processes. Therefore, while splenectomy results in the removal of all splenic cells, the overall impact on these processes may not be as pronounced as the specific reduction in macrophages due to compensatory mechanisms from other tissues and cells.

      Concerning Fig. 3D, we acknowledge the ambiguity in the initial interpretation. The calcein staining was utilized to determine cell viability, but it doesn't identify the specific cell types that have died. To address this, we performed a single-cell analysis, which can provide a more accurate identification of the specific cell types affected.

      5) Is the reduced phagocytic capacity in Fig4B significant? Erythrophagocytosis is compromised due to the considerable spontaneous loss of labelled erythrocytes; could other assays help? (potentially by a modified Chromium release assay?). Is it necessary to stimulated phagocytosis to see a significant effect?

      We express our gratitude for your insightful queries and recommendations. In response to your initial question, the observed reduction in phagocytic capacity illustrated in Fig. 4B was indeed statistically significant. However, in alignment with feedback from other reviewers, we have elected to exclude the phagocytic results from this revised manuscript, as they predominantly reflect immune function rather than erythrophagocytosis of macrophages.

      With respect to your proposal of potential alternatives to the erythrophagocytosis assay, we concur that the spontaneous loss of labeled erythrocytes could have influenced our results. Your suggestion of implementing a modified Chromium release assay is indeed an intriguing possibility that warrants further exploration.

      Regarding the requirement for stimulating phagocytosis, we employed stimulation as a mechanism to investigate the potential for augmenting erythrophagocytosis and iron processing within the red pulp. Our findings suggest that increased phagocytosis in the spleen contributes positively to these processes. As part of the Tuftsin injection experiment, we assessed the RBC count and hemoglobin content. Despite an observed reduction trend, there were no statistically significant alterations. We are uncertain if the observation period was insufficiently long. Nevertheless, we concur that it would be worthwhile to explore inherent changes without external stimulation, and we will take this into consideration in our future research.

      6) Can the observed ferroptosis be influenced by bi- and not trivalent iron chelators?

      Thank you for your insightful question. Indeed, the role of iron chelators in the observed ferroptosis is an important aspect to explore. Ferroptosis is a form of regulated cell death characterized by an iron-dependent accumulation of lipid peroxides, and the role of different iron chelators could potentially influence this process.

      In the case of bi- versus trivalent iron chelators, their influence on ferroptosis could be distinct due to their specificities for different forms of iron. However, we have not yet investigated this in our current study.

      Your suggestion has highlighted a valuable direction for our future research. We agree that examining the influence of bi- and trivalent iron chelators on the observed ferroptosis would provide a deeper understanding of the iron-dependent mechanisms involved in this process. We will consider this important aspect in our subsequent investigations.

      Reviewer #3 (Recommendations For The Authors):

      Methodology:

      1) Several syntax and grammatical errors, and unclear phrasing. Some factual errors as well: eg, line 380-81 the authors wrote that hypoxia increased viable cell numbers and phagocytosis ability, although their data suggest the opposite. Lines in Discussion 454-55 and in the Results 346-47 convey opposite messages.

      We appreciate your attention to detail and your feedback on the language and factual discrepancies within the manuscript.

      Upon revisiting lines 380-381, we would like to clarify that we had made a mistake. Our data indeed suggest that hypoxia led to a reduction in viable cell numbers and phagocytosis ability, not an increase as originally stated. We sincerely apologize for the confusion and will correct this statement in our revised manuscript.

      As for the opposing messages between lines 454-455 in the Discussion and 346-347 in the Results, we apologize for any confusion caused. We understand that it is crucial to maintain consistent interpretation of our data throughout the manuscript. We will carefully reevaluate these sections and adjust our phrasing to ensure that our interpretations accurately reflect our results.

      2) It is not clear why the authors investigated CD47 expression.

      Thank you for your question regarding our investigation of CD47 expression. CD47, also known as integrin-associated protein, is ubiquitously expressed on many cell types, including red blood cells (RBCs). In the context of our study, we used CD47 expression as an indicator of young RBCs, as CD47 is known to be highly expressed on newly produced RBCs. Our intention was to use CD47 positive cells as a proxy for new RBC production, which would give us insights into erythropoiesis under hypobaric hypoxia conditions. This marker thus provides valuable information about the rate and effectiveness of erythropoietic response to hypoxic stress. However, according to others reviewers’ suggestion, we removed this part of results in the revised manuscript.

      Minor:

      1) Y axis is often labeled without sufficient detail.

      2) The legends do not specify the exact statistical tests.

      3) Some in vivo exp contain n=3 which is relatively low for mouse-based studies.

      Some suggestions for the text:

      Line 60: is the main cause of erythrocytosis which in turn alleviates..

      62-66 - argumentation is not clear/grammatically correct and should be rephrased (eg, „RBC homeostasis is disturbed and never formed into a homeostasis status" - „homeostasis.. is never formed into a homeostasis status" sounds incorrect.

      Ref # 8 - does not fit, I assume this was a mistake and the authors aimed to cite a Review article by Slusarczyk and Mleczko-Sanecka in Genes. However, this reference seems appropriate to be discussed in the Discussion section as it is very directly connected to the content of the present manuscript

      76-78 - unclear/incomplete sentence (binding of iron to Tf and Tf-Fe delivery to the erythroid compartment is missing in this sentence, please, rephrase)

      80 - iron is not stored ON FtL

      90 - should be written: important role in iron recycling from RBCs

      94 - phrasing 'damage of erythrophagocytosis' is incorrect

      96-97 - should be written, for example: 'followed by eryptosis and iron recycling defects in the spleen'

      282 - the sentence is grammatically incorrect and unclear.

      292-94 - the statement is completely unclear, what can 'inhibit the excessive proliferation of RBCs'? What does it mean?

      Reference to tuftsin was not provided (Am J Gastroenterol, 1999;94:391-397; PLoS One. 2012;7(4):e34933)

      How quantification of microscopy images for F4/80 signal was performed?

      In Figure 5, more explanation is required for the readers regarding the measured genes/proteins - why the patter of gene expression changes suggest ferroptosis?

      Writing that ferroptosis INHIBITS phagocytosis is incorrect

      Line 460 is unclear

      468 - erythrocytophagy is not a commonly used term/

      We are grateful for your keen eye and the time you have taken to provide such thorough feedback. It will undoubtedly help us to significantly enhance the clarity and completeness of our research. We have modified the corresponding sections in our manuscript to include these details. The comments have helped us ensure that our methodology is transparent and our findings are presented clearly. We have taken all your comments into consideration in our revision. we also have revised our manuscript to discuss these alternative interpretations more clearly and to acknowledge the potential limitations of our data.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank both reviewers and the editor for their time and effort in carefully reviewing and comprehending our manuscript. We are grateful for their thorough assessment, as well as the insightful questions and suggestions they have provided. We have taken into account the questions and comments raised by the reviewers, and we have incorporated the necessary revisions accordingly. In the following pages the reviewers’ comments are italicized. Our replies are in normal script.

      In addition to revisions suggested by reviewers we also added a new summary schematic (Fig 8) and minor changes to acknowledgments.

      Reviewer 1

      This is a very strong study with few concerns. Regarding DN1+ T cell function, the authors assessed IFN-γ and activation markers, but it is unclear if the cells are polyfunctional (produced high levels of other cytokines at 6 weeks) or if there were changes in the humoral response (serum Ab titers or size/ number of germinal centers.)

      Thank you for your thorough assessment of our work and your kind comments.

      a. We observed a decreased IFN-γ and TNF-α production in antigen experienced DN1 T cells compared to naïve DN1 T cells, which is consistent with findings in Tfh cells.

      b. We tested for anti-MA IgM and IgG production but did not observe an increase in these antibodies in the vaccinated setting. It is possible that additional inflammatory stimulation, such as from an adjuvant or infection, may be necessary to trigger sufficient antibody level for detection using ELISA.

      c. We did not measure the number or size of germinal centers in this study, but future investigations could explore this aspect.

      Reviewer 2

      1. Authors elaborate the introduction solely highlighting the relevance of antigen persistence in the context of vaccination. However, it is well known that several mycobacterial antigens (Lipids and proteins) can cause detrimental responses when overexposed to the immune system. In this regard, it would be appropriate to introduce the possibility of the occurrence of exhaustion when prolonged exposure to antigens is happening, which is the main theme of this paper.

      Thank you for bringing these points to our attention. We have added a paragraph in the discussion section (page 15-16, line 372-386), addressing the implications of our findings in relation to exhaustion in the context of antigen persistence during chronic viral infections. We have also provided an example involving the lipid trehalose 6,6’-dibehenateled (TDM), a known virulence factor for Mtb, which has been utilized in several subunit vaccines without demonstrating significant toxicity.

      1. Authors need to provide more information about the source of MA. It is briefly mentioned in the materials and methods section that it was obtained from Sigma. If that is the case, it would be ideal to show the integrity of the polysaccharide in term of balance and abundance between different MA species.

      We obtained M. tuberculosis MA from Sigma, which comprises α-, keto-, and methoxy MA forms with an average combined lipid tail length of 80 carbons. MA-specific T cells preferentially recognize these three forms of MA have been identified in humans. We have provided more detailed information regarding the MA in the Materials and Methods section (page 17, line 429-431).

      1. Building up on the previous comment, MA is a complex mixture of polysaccharides including multiple lengths of fatty acids and modifications. Could the authors comments on the potential variability of MA structure and potential impact on immune responses?

      The binding capacities of Group 1 CD1-restricted T cells can be influenced by various factors, including specific head groups, lipid tail length, and structure of the lipid tail. Notably, DN1 T cells have been shown to have higher binding affinities towards keto and methoxy MA, while displaying weaker binding to α-MA (Van Rhijn et al., 2017, Eur. J. Immunol. 47:1525). In our study, we successfully utilized a mixture of MA to activate DN1 T cells, indicating that the required subtypes of MA were present in sufficient quantities to elicit this activation. In future investigations focusing on the polyclonal immune response, incorporating a mixture of MA and possibly other Mtb lipid antigens will enable a broader spectrum of T cell activation. This, in turn, is expected to enhance the overall effectiveness and robustness of protection in challenge experiments.

      1. How do the authors explain the lack of stimulation of cell proliferation induced by MA-PLGA formulation? Does this result contradict previous findings?

      This study represents the first instance of utilizing PLGA as a delivery system for a lipid antigen via a pulmonary vaccine route, despite its previous applications in numerous other vaccine formulations. Therefore, we do not think our findings contradicts any existing research in the field. It is worth noting that the immunogenicity of PLGA can be influenced by the specific polymer chemistry and formulation, which may account for potential variations in the observed effects. We have added additional text to the discussion (page 13, line 310 – 313) to address this point.

      1. Fig 3. Authors switch to IT administration simply arguing against the limitation of IN delivery regarding its low volume. However, administration via IN could be done in an iterative manner. According to this change, this reviewer asks whether the performance of MA-PLGA could now be comparable to BCN-MA using IT instead.

      PLGA possesses an inherent background adjuvant effect, which may not be ideal for precisely stimulating group 1 CD1-restricted T cells, as a considerable proportion of these T cells exhibit some level of autoreactivity (Li, et al, 2011, Blood 118:3870, De Lalla et al., 2011, Eur. J. Immunol. 41:602; de Jong et al, 2010, Nat. Immunol. 11:1102). Notably, our observations revealed that blank PLGA-NP exerted a significant stimulatory effect on both mouse (DN1) and human (M11) MA-specific T cells (Fig. 2A-D). This underscores the advantage of the BCN system, which lacks detectable adjuvant effects and enables a more controlled, dose-dependent augmentation of T cell responses with increasing concentrations of loaded MA. Therefore, we did not further evaluate the impact of PLGA-MA using the IT route of vaccination.

      1. What would be the reasons of the no role of encapsulating NP in the persistence of MA?

      In this study, we have provided evidence to support the notion that encapsulation plays a role in antigen persistence, as demonstrated in Fig. 5A-C. Specifically, we directly compared the persistence of MA when delivered encapsulated in BCNs versus without encapsulation in BCNs, using DC pulsing and IT vaccination as the delivery methods. Our results indicate that at 6 weeks post-vaccination, MA encapsulated in BCNs can activate DN1 T cells, while free MA does not. These findings may initially appear to be contradictory to those depicted in Fig. 5D-F, where antigen persistence is observed following vaccination with attenuated Mtb. However, we propose that the attenuated Mtb bacteria may function similarly to nanoparticles by encapsulating and containing MA, thereby facilitating its persistence within the host. We appreciate the opportunity to clarify these points (page 15, line 364-367). Encapsulation within PEG-PPS NP may also contribute to two additional mechanisms. First, we have demonstrated that PEG-PPS NPs target myeloid cell populations (Burke et al., 2022, Nat. Nano. 17:319), such as alveolar macrophages, that can serve as antigen persistence depots as well as present CD1b/MA complexes on their surfaces. NPs allow more efficient delivery to these cells, whereas otherwise the lipid would bind to albumin, HDL, LDL, and other lipid carriers in blood for a broader, non-specific biodistribution, which would include cells less efficient at antigen persistence or presentation. Second, we previously demonstrated that the BCN nanostructure is highly stable within cells, supporting a slow intracellular release (Bobbala et al., 2020, Nanoscale 12:5332). This could assist with a more sustained presentation of lipid antigen by targeted cells in contrast to free form lipid or NPs (like PLGA) that rapidly degrade within cells. Indeed, low levels of fluorescently tagged BCNs were still detectable 6 weeks post-vaccination (Fig. 6B). Our future studies will further investigate this hypothesis.

      1. Authors need to discuss to what extent the MA location into AM is route dependent.

      The localization of MA within alveolar macrophages (AMs) in the lung is likely specific to intratracheal (IT) vaccination. Therefore, mice vaccinated subcutaneously (SC) or intravenously (IV) may possess distinct antigen persistence depots. We have made modifications to the discussion section to further emphasize this point (page 15, line 359-364).

      1. Also, AM are programmed to sustain low immune responses because of their unique location in the lung. In fact, Mtb uses this to replicate while immune response is mounted. In this regard, accumulation of MA into this compartment may not be relevant for the overall immune response. In other words, what would be the contribution of this population to the T cell activation?

      It is likely that AMs primarily function as antigen depots and do not directly contribute to the activation of DN1 T cells. This assertion is supported by our findings, as co-culturing AMs with DN1 T cells alone did not result in T cell activation (Fig. 6E). However, we observed that the presence of hCD1Tg-expressing bone marrow-derived dendritic cells was necessary for DN1 T cell activation in vitro, which likely reflects a similar phenomenon occurring in vivo.

      1. Could the T cells responses measured be due to the reduced fraction of DC loaded with BCN-MA at initial time points?

      Regarding the T cell response observed in Fig. 5A-C, where we used DCs to deliver either free MA or MA-BCN, we took steps to address potential differences in loading capacity between the two at initial time points. Specifically, DCs were pulsed with a concentration of 10 𝜇g/mL for free MA and 5 𝜇g/mL of MA-BCN (the figure legend has been modified to clarify this point, page 37, line 962 - 963). To ensure approximate equivalence in loading, we examined the immune response one week after vaccination and found no statistically significant difference between the two methods.

    1. Author response

      We appreciate the responses from the editors and reviewers and will submit a revised manuscript addressing all of the main points raised. We are glad to see broad agreement that we took a careful approach and addressed a clear question.

      There were questions raised about the framing of the study vis-à-vis prior literature. One question was whether low frequency signals always have larger point spread functions, thereby making our result unsurprising. A second question was whether the notion of alpha oscillations as having wide-spread coherence and relating to system-general states was out-of-date. We appreciate these comments and agree that they could use further discussion. Our view is that neither of these points weakens the study, but our framing could be clearer regarding these two important issues. We will improve discussion of these topics in the revision.

      A second criticism mentioned by two reviewers is the lack of null-hypothesis testing. The value of null hypothesis statistical testing (NHST) in biomedicine is hotly debated, with many statisticians and scientists arguing that NHSTs add little to no value (Gigerenzer & Marewski, 2015; McShane et al., 2019; Meehl, 1978). Others of course disagree (Mogie, 2004). Our goal was not to try to rule out null hypotheses, but rather to make systematic measurements and to report the reliable patterns. We generally focused on observations where the results were well above the noise, obviating the need to test the null. Nonetheless, we can (and will) improve the clarity of our arguments in terms of how we rely on specific statistical analyses to support particular conclusions, as well as how to deal with the issue of multiple electrodes coming from small numbers of subjects, an important point raised by R3. We will clarify these issues in the revision.

      Reviewer 1 also made an interesting point about visual maps having an oculomotor component. We will do our best to incorporate this interesting issue into our revision.

      In addition to the public review, the reviewers made a number of useful recommendations for the revision. We appreciate these recommendations and will carefully consider each of them.

      Gigerenzer, G., & Marewski, J. N. (2015). Surrogate science: The idol of a universal method for scientific inference. Journal of management, 41(2), 421-440.

      McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2019). Abandon statistical significance. The American Statistician, 73(sup1), 235-245.

      Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806-834. https://doi.org/10.1037/0022006X.46.4.806

      Mogie, M. (2004). In support of null hypothesis significance testing. Proc Biol Sci, 271 Suppl 3(Suppl 3), S82-84. https://doi.org/10.1098/rsbl.2003.0105

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides potentially important, new information about the combination of information from the two eyes in humans. The data included frequency tagging of each eye's inputs and measures reflecting both cortical (EEG) and sub-cortical processes (pupillometry). Binocular combination is of potentially general interest because it provides -in essence- a case study of how the brain combines information from different sources and through different circuits. The strength of supporting evidence appears to be solid, showing that temporal modulations are combined differently than spatial modulations, with additional differences between subcortical and cortical pathways. However, the manuscript's clarity could be improved, including by adding more convincing motivations for the approaches used.

      We thank the editor and reviewers for their detailed comments and suggestions regarding our paper. We have implemented most of the suggested changes. In doing so we noticed a minor error in our analysis code that affected the functions shown in Figure 2e (previously Figure 1e), and have fixed this and rerun the modelling. Our main results and conclusions are unaffected by this change. We have also added a replication data set to the Appendix, as this bears on one of the points raised by a reviewer, and included a co-author who helped run this experiment.

      Reviewer #1 (Public Review):

      In this paper, the interocular/binocular combination of temporal luminance modulations is studied. Binocular combination is of broad interest because it provides a remarkable case study of how the brain combines information from different sources. In addition, the mechanisms of binocular combination are of interest to vision scientists because they provide insight into when/where/how information from two eyes is combined.

      This study focuses on how luminance flicker is combined across two eyes, extending previous work that focused mainly on spatial modulations. The results appear to show that temporal modulations are combined in different ways, with additional differences between subcortical and cortical pathways.

      1. Main concern: subcortical and cortical pathways are assessed in quite different ways. On the one hand, this is a strength of the study (as it relies on unique ways of interrogating each pathway). However, this is also a problem when the results from two approaches are combined - leading to a sort of attribution problem: Are the differences due to actual differences between the cortical and subcortical binocular combinations, or are they perhaps differences due to different methods. For example, the results suggest that the subcortical binocular combination is nonlinear, but it is not clear where this nonlinearity occurs. If this occurs in the final phase that controls pupillary responses, it has quite different implications.

      At the very least, this work should clearly discuss the limitations of using different methods to assess subcortical and cortical pathways.

      The modelling asserts that the nonlinearity is primarily interocular suppression, and that this is stronger in the subcortical pathway. Moreover the suppression impacts before binocular combination. So this is quite a specific location. We now say more about this in the Discussion, and also suggest that fMRI might avoid the limits on the conclusions we can draw from different methods.

      1. Adding to the previous point, the paper needs to be a better job of justifying not only the specific methods but also other details of the study (e.g., why certain parameters were chosen). To illustrate, a semi-positive example: Only page 7 explains why 2Hz modulation was used, while the methods for 2Hz modulation are described in detail on page 3. No justifications are provided for most of the other experimental choices. The paper should be expanded to better explain this area of research to non-experts. A notable strength of this paper is that it should be of interest to those not working in this particular field, but this goal is not achieved if the paper is written for a specialist audience. In particular, the introduction should be expanded to better explain this area of research, the methods should include justifications for important empirical decisions, and the discussion should make the work more accessible again (in addition to addressing the issues raised in point 1 above). The results also need more context. For example, why EEG data have overtones but pupillometry does not?

      We now explain the choice of frequency in the final paragraph of the introduction as follows:

      ‘We chose a primary flicker frequency of 2Hz as a compromise between the low-pass pupil response (see Barrionuevo et al., 2014; Spitschan et al., 2014), and the relatively higher-pass EEG response (Regan, 1966).’

      We also mention why the pupil response is low-pass:

      ‘The pupil response can be modulated by periodic changes in luminance, and is temporally low-pass (Barrionuevo et al., 2014; Spitschan et al. 2014), most likely due to the mechanical limitations of the iris sphincter and dilator muscles’.

      Reviewer #2 (Public Review):

      Previous studies have extensively explored the rules by which patterned inputs from the two eyes are combined in the visual cortex. Here the authors explore these rules for un-patterned inputs (luminance flicker) at both the level of the cortex, using Steady-State Visual Evoked Potentials (SSVEPs) and at the sub-cortical level using pupillary responses. They find that the pattern of binocular combination differs between cortical and sub-cortical levels with the cortex showing less dichoptic masking and somewhat more binocular facilitation.

      Importantly, the present results with flicker differ markedly from those with gratings (Hou et al., 2020, J Neurosci, Baker and Wade 2017 cerebral cortex, Norcia et al, 2000 Nuroreport, Brown et al., 1999, IOVS). When SSVEP responses are measured under dichoptic conditions where each eye is driven with a unique temporal frequency, in the case of grating stimuli, the magnitude of the response in the fixed contrast eye decreases as a function of contrast in the variable contrast eye. Here the response increases by varying (small) magnitudes. The authors favor a view that cortex and perception pool binocular flicker inputs approximately linearly using cells that are largely monocular. The lack of a decrease below the monocular level when modulation strength increase is taken to indicate that previously observed normalization mechanism in pattern vision does not play a substantial role in the processing of flicker. The authors present a computational model of binocular combination that captures features of the data when fit separately to each data set. Because the model has no frequency dependence and is based on scalar quantities, it cannot make joint predictions for the multiple experimental conditions which is one of its limitations.

      A strength of the current work is the use of frequency-tagging of both pupil and EEG responses to measure responses for flicker stimuli at two anatomical levels of processing. Flicker responses are interesting but have been relatively neglected. The tagging approach allows one to access responses driven by each eye, even when the other eye is stimulated which is a great strength. The tagging approach can be applied at both levels of processing at the same time when stimulus frequencies are low, which is an advantage as they can be directly compared. The authors demonstrate the versatility of frequency tagging in a novel experimental design which may inspire other uses, both within the present context and others. A disadvantage of the tagging approach for studying sub-cortical dynamics via pupil responses is that it is restricted to low temporal frequencies given the temporal bandwidth of the pupil. The inclusion of a behavioral measure and a model is also a strength, but there are some limitations in the modeling (see below).

      The authors suggest in the discussion that luminance flicker may preferentially drive cortical mechanisms that are largely monocular and in the results that they are approximately linear in the dichoptic cross condition (no effect of the fixed contrast stimulus in the other eye). By contrast, prior research using dichoptic dual frequency flickering stimuli has found robust intermodulation (IM) components in the VEP response spectrum (Baitch and Levi, 1988, Vision Res; Stevens et al., 1994 J Ped Ophthal Strab; France and Ver Hoeve, 1994, J Ped Ophthal Strab; Suter et al., 1996 Vis Neurosci). The presence of IM is a direct signature of binocular interaction and suggests that at least under some measurement conditions, binocular luminance combination is "essentially" non-linear, where essential implies a point-like non-linearity such as squaring of excitatory inputs. The two views are in striking contrast. It would thus be useful for the authors could show spectra for the dichoptic, two-frequency conditions to see if non-linear binocular IM components are present.

      This is an excellent point, and one that we had not previously appreciated the importance of. We have generated a figure (Fig 8) showing the IM response in the cross frequency conditions. There is a clear response at 0.4Hz in the pupillometry data (2-1.6Hz), and at 3.6Hz in the EEG data (2+1.6Hz). We therefore agree that this shows the system is essentially nonlinear, despite the binocular combination appearing approximately linear. We now say in the Discussion:

      ‘In the steady-state literature, one hallmark of a nonlinear system is the presence of intermodulation responses at the sums and differences of fundamental flicker frequencies (Baitch & Levi, 1988; Tsai et al., 2012). In Figure 8 we plot the amplitude spectra of conditions from Experiment 1 in which the two eyes were stimulated at different frequencies (2Hz and 1.6Hz) but at the same contrast (48%; these correspond to the binocular cross and dichoptic cross conditions in Figures 2d,e and 3d,e). Consistent with the temporal properties of pupil responses and EEG, Figure 8a reveals a strong intermodulation difference response at 0.4Hz (red dashed line), and Figure 8b reveals an intermodulation sum response at 3.6Hz (red dashed line). The presence of these intermodulation terms is predicted by nonlinear gain control models of the type considered here (Baker and Wade, 2017; Tsai et al., 2012), and indicates that the processing of monocular flicker signals is not fully linear prior to the point at which they are combined across the eyes.’

      If the IM components are indeed absent, then there is a question of the generality of the conclusions, given that several previous studies have found them with dichoptic flicker. The previous studies differ from the authors' in terms of larger stimuli and in their use of higher temporal frequencies (e.g. 18/20 Hz, 17/21 Hz, 6/8 Hz). Either retinal area stimulated (periphery vs central field) or stimulus frequency (high vs low) could affect the results and thus the conclusions about the nature of dichoptic flicker processing in cortex. It would be interesting to sort this out as it may point the research in new directions.

      This is a great suggestion about retinal area. As chance would have it, we had already collected a replication data set where we stimulated the periphery, and we now include a summary of this data set as an Appendix. In general the results are similar, though we obtain a measurable (though still small) second harmonic response in the pupillometry data with this configuration, which is a further indication of nonlinear processing.

      Whether these components are present or absent is of interest in terms of the authors' computational model of binocular combination. It appears that the present model is based on scalar magnitudes, rather than vectors as in Baker and Wade (2017), so it would be silent on this point. The final summation of the separate eye inputs is linear in the model. In the first stage of the model, each eye's input is divided by a weighted input from the other eye. If we take this input as inhibitory, then IM would not emerge from this stage either.

      We have performed the modelling using scalar values here for simplicity and transparency, and to make the fitting process computationally feasible (it took several days even done this way). This type of model is quite capable of processing sine waves as inputs, and producing a complex output waveform which is Fourier transformed and then analysed in the same way as the experimental data (see e.g. Tsai, Wade & Norcia, 2012, J Neurosci; Baker & Wade, 2017, Cereb Cortex). However our primary aim here was to fit the model, and make inferences about the parameter values, rather than to use a specific set of parameter values to make predictions. We now say more about this family of models and how they can be applied in the methods section:

      “Models from this family can handle both scalar contrast values and continuous waveforms (Tsai et al., 2012) or images (Meese and Summers, 2007) as inputs. For time-varying inputs, the calculations are performed at each time point, and the output waveform can then be analysed using Fourier analysis in the same way as for empirical data.This means that the model can make predictions for the entire Fourier spectrum, including harmonic and intermodulation responses that arise as a consequence of nonlinearities in the model (Baker and Wade, 2017). However for computational tractability, we performed fitting here using scalar contrast values.”

      As a side point, there are quite a lot of ways to produce intermodulation terms, meaning they are not as diagnostic as one might suppose. We demonstrate this in Author response image 1, which shows the Fourier spectra produced by a toy model that multiplies its two inputs together (for an interactive python notebook that allows various nonlinearities to be explored, see here). Intermodulation terms also arise when two inputs of different frequencies are summed, followed by exponentiation. So it would be possible to have an entirely linear binocular summation process, followed by squaring, and have this generate IM terms (not that we think this is necessarily what is happening in our experiments).

      Author response image 1

      Related to the model: One of the more striking results is the substantial difference between the dichoptic and dichoptic-cross conditions. They differ in that the latter has two different frequencies in the two eyes while the former has the same frequency in each eye. As it stands, if fit jointly on the two conditions, the model would make the same prediction for the dichoptic and dichoptic-cross conditions. It would also make the same prediction whether the two eyes were in-phase temporally or in anti-phase temporally. There is no frequency/phase-dependence in the model to explain differences in these cases or to potentially explain different patterns at the different VEP response harmonics. The model also fits independently to each data set which weakens its generality. An interpretation outside of the model framework would thus be helpful for the specific case of differences between the dichoptic and dichoptic-cross conditions.

      As mentioned above, the limitations the reviewer highlights are features of the specific implementation, rather than the model architecture in general. Furthermore, although this particular implementation of the model does not have separate channels for different phases, these can be added (see e.g. Georgeson et al., 2016, Vis Res, for an example in the spatial domain). In future work we intend to explore the phase relationship of flicker, but do not have space to do this here.

      Prior work has defined several regimes of binocular summation in the VEP (Apkarian et al.,1981 EEG Journal). It would be useful for the authors to relate the use of their terms "facilitation" and "suppression" to these regimes and to justify/clarify differences in usage, when present. Experiment 1, Fig. 3 shows cases where the binocular response is more than twice the monocular response. Here the interpretation is clear: the responses are super-additive and would be classed as involving facilitation in the Apkarian et al framework. In the Apkarian et al framework, a ratio of 2 indicates independence/linearity. Ratios between 1 and 2 indicate sub-additivity and are diagnostic of the presence of binocular interaction but are noted by them to be difficult to interpret mechanistically. This should be discussed. A ratio of <1 indicates frank suppression which is not observed here with flicker.

      Operationally, we use facilitation to mean an increase in response relative to a monocular baseline, and suppression to mean a decrease in response. We now state this explicitly in the Introduction. Facilitation greater than a factor of 2 indicates some form of super-additive summation. In the context of the model, we also use the term suppression to indicate divisive suppression between channels, however this feature does not always result in empirical suppression (it depends on the condition, and the inhibitory weight). We think that interpretation of results such as these is greatly aided by the use of a computational modelling framework, which is why we take this approach here. The broad applicability of the model we use in the domain of spatial contrast lends it credibility for our stimuli here.

      Can the model explore the full range of binocular/monocular ratios in the Apkarian et al framework? I believe much of the data lies in the "partial summation" regime of Apkarian et al and that the model is mainly exploring this regime and is a way of quantifying varying degrees of partial summation.

      Yes, in principle the model can produce the full range of behaviours. When the weight of suppression is 1, binocular and monocular responses are equal. When the weight is zero, the model produces linear summation. When the weight is greater than 1, suppression occurs. It is also possible to produce super-additive summation effects, most straightforwardly by changing the model exponents. However this was not required for our data here, and so we kept these parameters fixed. We agree that the model is a good way to unify the results across disparate experimental paradigms, and that is our main intention with Figure 7i.

      Reviewer #3 (Public Review):

      This manuscript describes interesting experiments on how information from the two eyes is combined in cortical areas, sub-cortical areas, and perception. The experimental techniques are strong and the results are potentially quite interesting. But the manuscript is poorly written and tries to do too much in too little space. I had a lot of difficulty understanding the various experimental conditions, the complicated results, and the interpretations of those results. I think this is an interesting and useful project so I hope the authors will put in the time to revise the manuscript so that regular readers like myself can better understand what it all means.

      Now for my concerns and suggestions:

      The experimental conditions are novel and complicated, so readers will not readily grasp what the various conditions are and why they were chosen. For example, in one condition different flicker frequencies were presented to the two eyes (2Hz to one and 1.6Hz to the other) with the flicker amplitude fixed in the eye presented to the lower frequency and the flicker amplitude varied in the eye presented to the higher frequency. This is just one of several conditions that the reader has to understand in order to follow the experimental design. I have a few suggestions to make it easier to follow. First, create a figure showing graphically the various conditions. Second, come up with better names for the various conditions and use those names in clear labels in the data figures and in the appropriate captions. Third, combine the specific methods and results sections for each experiment so that one will have just gone through the relevant methods before moving forward into the results. The authors can keep a general methods section separate, but only for the methods that are general to the whole set of experiments.

      We have created a new figure (now Fig 1) that illustrates the conditions from Experiment 1, and is referenced throughout the paper. We have kept the names constant, as they are rooted in a substantial existing literature, and it will be confusing to readers familiar with that work if we diverge from these conventions. We did consider separating out the methods section, but feel it helps the flow of the results section to keep it as a single section.

      I wondered why the authors chose the temporal frequencies they did. Barrionuevo et al (2014) showed that the human pupil response is greatest at 1Hz and is nearly a log unit lower at 2Hz (i.e., the change in diameter is nearly a log unit lower; the change in area is nearly 2 log units lower). So why did the authors choose 2Hz for their primary frequency? And why did the authors choose 1.6Hz which is quite close to 2Hz for their off frequency? The rationale behind these important decisions should be made explicit.

      We now explain this in the Introduction as follows:

      ‘We chose a primary flicker frequency of 2Hz as a compromise between the low-pass pupil response (see Barrionuevo et al., 2014; Spitschan et al., 2014), and the relatively higher-pass EEG response (Regan, 1966).’

      It is a compromise frequency that is not optimal for either modality, but generates a measurable signal for both. The choice of 1.6 Hz was for similar reasons - for a 10-second trial it is four frequency bins away from the primary frequency, so can be unambiguously isolated in the spectrum.

      By the way, I wondered if we know what happens when you present the same flicker frequencies to the two eyes but in counter-phase. The average luminance seen binocularly would always be the same, so if the pupil system is linear, there should be no pupil response to this stimulus. An experiment like this has been done by Flitcroft et al (1992) on accommodation where the two eyes are presented stimuli moving oppositely in optical distance and indeed there was no accommodative response, which strongly suggests linearity.

      We have not tried this yet, but it’s on our to-do list for future work. The accommodation work is very interesting, and we now cite it in the manuscript as follows:

      ‘Work on the accommodative response indicates that binocular combination there is approximately linear (Flitcroft et al. 1992), and can even cancel when signals are in antiphase (we did not try this configuration here).’

      Figures 1 and 2 are important figures because they show the pupil and EEG results, respectively. But it's really hard to get your head around what's being shown in the lower row of each figure. The labeling for the conditions is one problem. You have to remember how "binocular" in panel c differs from "binocular cross" in panel d. And how "monocular" in panel d is different than "monocular 1.6Hz" in panel e. Additionally, the colors of the data symbols are not very distinct so it makes it hard to determine which one is which condition. These results are interesting. But they are difficult to digest.

      We hope that the new Figure 1 outlining the conditions has helped with interpretation here.

      The authors make a strong claim that they have found substantial differences in binocular interaction between cortical and sub-cortical circuits. But when I look at Figures 1 and 2, which are meant to convey this conclusion, I'm struck by how similar the results are. If the authors want to continue to make their claim, they need to spend more time making the case.

      Indeed, it is hard to make direct comparisons across figures - this is why Figure 4 plots the ratio of binocular to monocular conditions, and shows a clear divergence between the EEG and pupillometry results at high contrasts.

      Figure 5 is thankfully easy to understand and shows a very clear result. These perceptual results deviate dramatically from the essentially winner-take-all results for spatial sinewaves shown by Legge & Rubin (1981); whom they should cite by the way. Thus, very interestingly the binocular combination of temporal variation is quite different than the binocular combination of spatial variation. Can the pupil and EEG results also be plotted in the fashion of Figure 5? You'd pick a criterion pupil (or EEG) change and use it to make such plots.

      We now cite Legge & Rubin. We see what you mean about plotting the EEG and pupillometry results in the same coordinates as the matching data, but we don’t think this is especially informative as we would end up only with data points along the axes and diagonal of the plot, without the points at other angles. This is a consequence of how the experiments were conducted.

      My main suggestion is that the authors need to devote more space to explaining what they've done, what they've found, and how they interpret the data. I suggest therefore that they drop the computational model altogether so that they can concentrate on the experiments. The model could be presented in a future paper.

      We feel that the model is central to the understanding and interpretation of our results, and have retained it in the revised version of the paper.

      Reviewer #2 (Recommendations For The Authors):

      I found the terms for the stimulus conditions confusing. I think a simple schematic diagram of the conditions would help the reader.

      Now added (the new Fig 1).

      In reporting the binocular to monocular ratio, please clarify whether the monocular data was from one eye alone (and how that eye was chosen) or from both eyes and then averaged, or something else. It would be useful to plot the results from the dichoptic condition in this form, as well.

      These were averaged across both eyes. We now say in the Methods section:

      ‘We confirmed in additional analyses that the monocular consensual pupil response was complete, justifying our pooling of data across the eyes.’

      Also, clarify whether the term facilitation is used as above throughout (facilitation being > 2 times monocular response under binocular condition) or if a different criterion is being used. If we take facilitation to mean a ratio > 2, then facilitation depends on temporal frequency in Figure 4.

      We now explain our use of these terms in the final paragraph of the Introduction:

      ‘Relative to the response to a monocular signal, adding a signal in the other eye can either increase the response (facilitation) or reduce it (suppression).’

      The magnitude of explicit facilitation attained is interesting, but not without precedent. Ratios of binocular to mean monocular > 2, have been reported previously and values of summation depend strongly on the stimulus used (see for example Apkarian et al., EEG Journal, 1981, Nicol et al., Doc Ophthal, 2011).

      We now mention this in the Discussion as follows:

      ‘(however we note that facilitation as substantial as ours has been reported in previous EEG work by Apkarian et al. (1981))’

      In Experiment 3, the authors say that the psychophysical matching results are consistent with the approximately linear summation effects observed in the EEG data of Experiment 1. In describing Fig. 3, the claim is that the EEG is non-linear, e.g. super-additive - at least at high contrasts. Please reconcile these statements.

      We think that the ‘superadditive’ effects are close enough to linear that we don’t want to make too much of a big deal about them - this could be measurement error, for example. So we use terms such as near-linear, or approximately linear, when referring to them throughout.

      Reviewer #3 (Recommendations For The Authors):

      Let me make some more specific comments using a page/paragraph/line format to indicate where in the text they're relevant.

      1/2 (middle)/3 from end. "In addition" seems out of place here.

      Removed.

      1/3/4. By "intensities" do you mean "contrasts"?

      Fixed.

      1/3/last. "... eyes'...".

      Fixed.

      2/5/3. By "one binocular disc", you mean into "one perceptually fused disc".

      Rewritten as: ‘to help with their perceptual fusion, giving the appearance of a single binocular disc’

      3/1/1. "calibrated" seems like the wrong word here. I think you're just changing the vergence angle to enable fusion, right?

      Now rewritten as: ‘Before each experiment, participants adjusted the angle of the stereoscope mirrors to achieve binocular fusion’

      3/1/1. "adjusting the angles...". And didn't changing the mirror angles affect the shapes of the discs in the retinal images?

      Perhaps very slightly, but this is well within the tolerance of the visual system to compensate for in the fused image, especially for such high contrast edges.

      3/3/5. "fixed contrast" is confusing here because it's still a flickering stimulus if I follow the text here. Reword.

      Now ‘fixed temporal contrast’

      3/4/1. It would be clearer to say "pupil tracker" rather than "eye tracker" because you're not really doing eye tracking.

      True, but the device is a commercial eye tracker, so this is the appropriate term regardless of what we are using it for.

      3/5/6. I'm getting lost here. "varying contrast levels" applies to the dichoptic stimulus, right?

      Yes, now reworded as ‘In the other interval, a target disc was displayed, flickering at different contrast levels on each trial, but with a fixed interocular contrast ratio across the block.’

      3/5/7. Understanding the "ratio of flicker amplitudes" is key to understanding what's going on here. More explanation would be helpful.

      Addressed in the above point.

      4/3/near end. Provide some explanation about why the Fourier approach is more robust to noise.

      Added ‘(which can make the phase and amplitude of a fitted sine wave unstable)’

      Figure 1. In panel a, explain what the numbers on the ordinate mean. What's zero, for example? Which direction is dilation? Same question for panel b. It's interesting in panel c that the response in one eye to 2Hz increases when the other eye sees 1.6Hz. Would be good to point that out in the text.

      Good idea about panel (a) - we have changed the y-axis to ‘Relative amplitude’ for clarity, and now note in the figure caption that ‘Negative values indicate constriction relative to baseline, and positive values indicate dilation.’ Panel (b) is absolute amplitude, so is unsigned. Panel (c) only contains 2Hz conditions, but there is some dichoptic suppression across the two frequencies in panels (d,e) - we now cover this in the text and include statistics.

      6/2/1. Make clear in the text that Figure 1c shows contrast response functions for the pupil.

      Now noted in the caption.

      Figure 3. I'm lost here. I feel like I should be able to construct this figure from Figures 1 and 2, but don't know how. More explanation is needed at least in the caption.

      Done. The caption now reads:

      ‘Ratio of binocular to monocular response for three data types. These were calculated by dividing the binocular response by the monocular response at each contrast level, using the data underlying Figures 2c, 3c and 3f. Each value is the average ratio across N=30 participants, and error bars indicate bootstrapped standard errors.’

      9/1/1-2. I didn't find the evidence supporting this statement compelling.

      We now point the reader to Figure 4 as a reminder of the evidence for this difference.

      9/1/6-9. You said this. But this kind of problem can be fixed by moving the methods sections as I suggested above.

      As mentioned, we feel that the results section flows better with the current structure.

      Figure 4. Make clear that this is EEG data.

      Now added to caption.

      Figure 5 caption. Infinite exponent in what equation?

      Now clarified as: ‘models involving linear combination (dotted) or a winner-take-all rule (dashed)’

      Figure 6. I hope this gets dropped. No one will understand how the model predictions were derived. And those who look at the data and model predictions will surely note (as the authors do) that they are rather different from one another.

      As noted above, we feel that the model is central to the paper and have retained this figure. We have also worked out how to correct the noise parameter in the model for the number of participants included in the coherent averaging, which fixes the discrepancy at low contrasts. The correspondence between the data and model in is now very good, and we have plotted the data points and curves in the same panels, which makes the figure less busy.

      12/1. Make clear in this paragraph that "visual cortex" is referring to EEG and perception results and that "subcortical" is referring to pupil. Explain clearly what "linear" would be and what the evidence for "non-linear" is.

      Good suggestion, we have added qualifiers linking to both methods. Also tidied up the language to make it clearer that we are talking about binocular combination specifically in terms of linearity, and spelled out the evidence for each point.

      12/2/6-9. Explain the Quaia et al results enough for the reader to know what reflexive eye movements were studied and how.

      We now specify that these eye movements are also known as the ‘ocular following response’ and were measured using scleral search coils.

      12/2/9-10. Same for Spitchan and Cajochen: more explanation.

      Added:

      “(melatonin is a hormone released by the pineal gland that regulates sleep; its production is suppressed by light exposure and can be measured from saliva assays)”

      12/3/2-3. Intriguing statements about optimally combining noisy signals, but explain this more. It won't be obvious to most readers.

      We have added some more explanation to this section.

      13/1. This is an interesting paragraph where the authors have a chance to discuss what would be most advantageous to the organism. They make the standard argument for perception, but basically punt on having an argument for the pupil.

      Indeed, we agree that this point is necessarily speculative, however we think it is interesting for the reader to consider.

      13/2/1. "Pupil size affects the ..." is more accurate.

      Fixed.

      13/2/2 from end. Which "two pathways"? Be clear.

      Changed to ‘the pupil and perceptual pathways’

    1. Author Response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3 (Recommendations For The Authors):

      In response to my comment about Col10a1 expression in the dermal SFCs (Fig 3B, I), the authors provide additional text to clarify but also state that "Col2 genes were not detected robustly". I think this comment on the absence of Col2 transcripts should be explicitly included in that paragraph as it is a reasonable and expected question given the cartilage angle the authors begin the paragraph with. Including this in no way weakens their point, rather adds clarity.

      This version includes some modifications to fix typos and add a sentence in response to the concern above.

    2. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In their study, Aman et al. utilized single cell transcriptome analysis to investigate wild-type and mutant zebrafish skin tissues during the post-embryonic growth period. They identified new epidermal cell types, such as ameloblasts, and shed light on the effects of TH on skin morphogenesis. Additionally, they revealed the important role of the hypodermis in supporting pigment cells and adult stripe formation. Overall, I find their figures to be of high quality, their analyses to be appropriate and compelling, and their major claims to be wellsupported by additional experiments. Therefore, this study will be an important contribution to the field of vertebrate skin research. Although I have no major concerns, I would like to offer a few minor comments for the authors to consider.

      1) The discovery of ameloblasts in the zebrafish skin is a fascinating finding that could potentially provide a new research model for understanding the development and regeneration of vertebrate teeth. It would be beneficial if the authors could provide further elaboration on this aspect and discuss how the zebrafish scale model could be utilized by researchers to better understand the morphogenesis of vertebrate teeth and/or hair.

      We have provided additional discussion points regarding epidermal EMP+ cells with ameloblast-like transcriptional profiles. We believe that further studies of scale matrix composition and the material properties endowed by various collagenous and non-collagen matrix proteins will be useful for understanding fundamental mechanisms of biomineralization. This section of the discussion now reads:

      “We systematically assessed the expression of genes encoding non-collagen calcified matrix proteins throughout the skin during squamation, leading to the discovery of a transcriptionally distinct population of basal epidermal cells that express EMP transcripts, likely corresponding to epidermal secretory cells proposed to participate in scale matrix formation based on ultrastructure (Sire et al., 1997). These cells also express dlx3a, dlx4a, runx2b and msx2a but not sp7, a transcription factor suite that is shared with ameloblasts that form tooth enamel. While these transcription factors are not exclusive to ameloblasts and have been reported in osteoblasts and odontoblasts, in addition to cell types that do not produce calcified matrix, such as neurons, their co-expression along with EMP-encoding transcripts in basal epidermal cells is consistent with a common origin of ameloblasts and the EMP+ epidermal cells reported here. One alternative hypothesis is that co-expression of these gene products arose convergently and can be explained by mechanistic linkages among them. Future work aimed at functionally dissecting the regulatory mechanisms that govern EMP gene expression in a variety of organisms may clarify these issues either by providing evidence of additional commonalities, supporting a shared ancestor, or by revealing diverse, lineage-specific regulatory architectures, supporting convergent evolution of superficial enamel deposition in teeth and fish skin appendages.”

      2) While the overexpression-rescue experiments (i.e., fgf20a and pdafaa) provide crucial evidence to support the author's conclusions, it is important to note that overexpression driven by the heat-shock promoter is not spatially regulated. Therefore, it should be acknowledged that the rescue effects may not be cell-autonomous, as suggested in the current version.

      The reviewer is correct that hsp70l promotor is not spatially regulated and F0 transgenics have random mosaic expression. Importantly, since we were testing specific hypotheses regarding signaling interactions between basal epidermal cells and dermal cells, we applied stringent selection and only analyzed individuals with transgene expression in basal epidermal cells. This approach enabled us to assay the results of basal cell expression of signaling ligands in eda mutant and hypo-thyroid backgrounds. The original manuscript omitted this crucial aspect of our experimental design, and we thank the reviewer for noticing this omission. We have revised the following parts of the results section.

      “Indeed, heatshock-driven expression in F0 mosaics stringently selected for basal epidermal expression of Fgf20a in the skin of Eda mutants led to localized rescue of scales where transgene expression was detectable (Figure 5D).”

      “When we forced expression of Pdgfaa in basal cells of epidermis by heatshock induction and stringent selection of basal epidermal expression in F0 mosaics, we found, as predicted, a recruitment of dermal cells in hypoTH skin, leading to a locally stratified dermis (Figure 6E) similar to that of the wild-type (Figure 4C).”

      We additionally revised the legends for Figure 5 and Figure 6 to mention stringent selection of basal epidermal expression of fgf20a and pdgfaa, respectively.

      3) Figure 7D. The authors used the ET37:EGFP lines to visualize hypodermis. Based on the absence of EGFP signal in the deep dermis of bnc2 mutants, the authors concluded that the hypodermis may be missing, suggesting the importance of the hypodermis in pigment cell formation. However, since the EGFP evidence is indirect, it is crucial to confirm the absence of the hypodermis structure with histology.

      It is indeed conceivable that hypodermal cells physically persist in bnc2 mutants yet have sufficiently altered gene expression that they neither cluster with wild-type hypodermal cells in single cell RNA-seq analyses nor initiate or maintain the broadly expressed dermal reporter ET37:GFP that we used to assess the presence or absence of such cells in a defined anatomical position. Though we believe this to be somewhat unlikely (hence our original interpretation), we have added a caveat referencing this formal possibility in the revised manuscript:

      “It is possible that hypodermal cells physically persist in bnc2 mutants but have sufficiently altered transcriptional profiles such that they no longer cluster together with wild-type hypodermal cells or express the ET37:EGFP transgene. Nevertheless, these analyses suggest that ET37:EGFP+ hypodermal cells likely play a role in pigment pattern development.”

      We believe this issue raises interesting philosophical questions about the definition of a “cell-type.” If cells constituting the deep surface of the dermis physically persist, but have a profoundly altered transcriptional profile and no longer perform the biological functions of their wild-type counterparts, are they still the original cell type, or was the wild-type cell type lost? As researchers continue to discover new cell types and deepen our understanding of cell-state plasticity in normal and pathological conditions, the community will need to articulate new rubrics of categorization to ensure that “cell-type” remains a rigorous and useful concept (if, indeed, it has been one).

      4) As the dataset is expected to be a valuable asset to the field, please provide Excel tables summarizing the key genes and their corresponding expression levels for each major cluster that has been identified.

      This table has been provided in the revised manuscript (Supplementary file 2 – Table 5.)

      Reviewer #2 (Public Review):

      The authors used single cell transcriptome analysis of zebrafish skin cells and characterized various types of cells that are involved in scale formation and stripe patterning. The methods employed in this study is highly powerful to provide mechanistic explanation of these fundamental biological issues and will be a good example for many researchers studying other biological issues. Furthermore, the results characterizing differences in gene expression patterns among various types of cells will be informative for other researchers in the field.

      For scale formation, it is known that mineralized tissues may significantly differ in rayfins and lobefins since sox9, col2a1, and col10a1 are all expressed in osteoblasts, in addition to chondrocytes, in zebrafish and gar (Eames et al., 2012, BMC Evol. Biol.). Furthermore, in mammals, Col10 is expressed in chondrocytes in mature cartilage that undergoes ossification. Thus, unlike the authors argue, col10a1 expression is not apparently relevant to the elasticity of scales.

      The authors also state that the expression of dlx4a, msx2a, and runx2b characterize cells homologous to mammalian ameloblasts. However, dlx4, runx2, and msx2 are all duplicated in zebrafish, and the function of duplicated genes in teleost fishes may differ from that of single ancestral gene. Moreover, none of Dlx4, Msx2, and Runx2 is expressed specifically by ameloblasts in mammals. Indeed, both Msx2 and Runx2 are expressed in osteoblasts, while the expression of Dlx4 in ameloblasts is not reported. These results, together with the expression of an enamel gene, enam, in dermal cells (SFC), do not appear to support the homology of the surface tissue of mammalian teeth and zebrafish scales.

      We appreciate the reviewers’ comments and have provided caveats to our interpretation in the revised manuscript (see our response to Reviewer #1, item 1, above). In the revised manuscript, we also display results for an additional Dlx gene, dlx3b, that is coexpressed in EMP+ basal epidermal cells (Figure 3C), although dlx4 has been reported in mammalian tooth germs and elasmobranch tooth and odontode epithelia (Pemberton et al., 2007; Debiais-Thibaud et al., 2011 ; Woodruff et al., 2022).

      More generally, expression of specific genes can be useful characters for testing hypotheses of homology. The operant inference depends on a parsimony assumption: if a transcriptional profile is shared between celltypes in disparate organisms, one explanation is that this transcriptional profile was inherited from a common ancestor. This inference is not impacted by the teleost whole genome duplication. If the common ancestor had one ortholog and a subset of modern animals have two, the homology hypothesis predicts that at least one ortholog will be expressed in common in the tissue that descended from the common ancestor. This interpretation is entirely compatible with our understanding of the mechanisms that underlie retention of duplicated genes in animal genomes. Additionally, exclusivity is not necessarily predicted by homology hypotheses. Indeed, all the transcription factors used here as characters for evaluating homology have pleiotropic roles in many cell types.

      In this specific case, we found two EMP genes, ambn and enam, co-expressed with a complement of transcription factors that is also co-expressed in ameloblasts. These findings are consistent with a model in which both ameloblasts and EMP+ epidermal cells associated with zebrafish scales inherited this transcriptional profile from a common ancestral cell type. Given the temporal and phylogenetic continuity of superficial enameling in the fossil record of skin appendages, and the dual origin of mineralized matrices in extant skin appendages and teeth, we continue to favor the model where these traits are shared and conserved among vertebrates. Nevertheless, we have acknowledged in the revised manuscript the limitations of homology testing by analyses of gene expression and the possibility that these traits might have evolved convergently; we suggest additional research avenues for testing this hypothesis further (response to Reviewer #1, item 1, above).

      Reviewer #3 (Public Review):

      This work describes transcriptome profiling of dissected skin of zebrafish at post-embryonic stages, at a time when adult structures and patterns are forming. The authors have used the state-of-the-art combinatorial indexing RNA-seq approach to generate single cell (nucleus) resolution. The data appears robust and is coherent across the four different genotypes used by the authors.

      The authors present the data in a logical and accessible manner, with appropriate reference to the anatomy. They include helpful images of the biology and schematics to illustrate their interpretations.

      The datasets are then interrogated to define cell and signalling relationships between skin compartments in six diverse contexts. The hypotheses generated from the datasets are then tested experimentally. Overall, the experiments are appropriate and rigorously performed. They ask very interesting questions of interactions in the skin and identify novel and specific mechanisms. They validate these well.

      The authors use their datasets to define lineage relationships in the dermal scales and also in the epidermis. They show that circumferential pre-scale forming cells are precursors of focal scale forming cells while there appeared a more discontinuous relationship between lineages in the epidermis.

      The authors present transcriptome evidence for enamel deposition function in epidermal subdomains. This is convincingly confirmed with an ameloblastin in situ. They further demonstrate distinct expression of SCPP and collagen genes in the SFC regions.

      The authors then demonstrate that Eda and TH signalling to the basal epidermal cells generates FGF and PDGF ligands to signal to surrounding mesenchyme, regulating SFC differentiation and dermal stratification respectively.

      Finally they exploit RNA-seq data performed in parallel in the bnc2 mutants to identify the hypodermal cells as critical regulators of pigment patterning and define the signalling systems used.

      Whilst these six interactions in the skin are disparate, the stories are unified by use of the sci-RNA-seq data to define interactions. Overall, it's an assembly of work which identifies novel and interesting cell interactions and cross-talk mechanisms. There are some aspects that require clarification:

      With respect to the discontinuous relationship noted in Figure 2I in the epidermis, the authors did not make mention of the fact that there are in fact two independent sources of periderm in the zebrafish. The first periderm derives from the EVL, is segregated a gastrulation, and gradually replaced from the basal epidermis during post-embryonic stages. Could this residual EVL-derived periderm have reduced sensitivity of the trajectory mapping from basal to periderm? The authors should comment whether their transcriptome dataset likely had residual EVL-derived periderm and if this might have impacted their trajectory continuity interpretation.

      While dual origin of periderm may impact the single cell analysis, this should not be an issue for suprabasal cells, which also show no continuity with their basal cell progenitors in UMAP space. We thank the reviewer for bringing this issue up and comment on the dual origin of periderm in the revised manuscript.

      “During this stage of development, basal epidermal cells are the stem cell population that differentiate into both suprabasal and periderm cells, and each of the three major epidermal cell types are well represented in our dataset (Figure 2H,I; Figure 1—figure supplement 3)(Guzman et al., 2013; Lee et al., 2014). While periderm cells at the sampled stage are likely of dual origin, representing a mixture of early embryonic and stem cell derived cells, suprabasal cells are entirely derived from basal cells (Kimmel et al., 1990; Guzman et al., 2013; Lee et al., 2014).”

      During this stage of development, basal epidermal cells are the stem cell population that differentiate into both suprabasal and periderm cells, and each of the three major epidermal cell types are well represented in our dataset (Figure 2H,I; Figure 1—figure supplement 3)(Guzman et al., 2013; Lee et al., 2014). While periderm cells at the sampled stage are likely of dual origin, representing a mixture of early embryonic and stem cell derived cells, suprabasal cells are entirely derived from basal cells (Kimmel et al., 1990; Guzman et al., 2013; Lee et al., 2014).

      The authors ask if dermal SFCs express proteins associated with cartilage formation and use Col10a1 orthologues as markers (Fig 3B, I). I wonder if these are the best transcripts to answer this question as this has also been described to label osteoblasts in certain contexts in the fish and the authors might want to refer to Li et al 2009 or Avaron et al 2005. Were other markers of cartilage formation present such as collagen2 genes? These may be more definitive. The authors might want to reinterrogate their datasets for true cartilage markers or reframe their question.

      In the revised manuscript, we have clarified and moderated inferences from col10a1 ortholog expression. Col2 genes were not detected robustly in our dataset. This section now reads:

      “Scale elasmoidin is a flexible, collagenous ECM, material properties that are similar to cartilage (Quan et al., 2020). We therefore wondered whether dermal SFCs express matrix proteins associated with cartilage formation. Col10a1 is a major structural molecule in collagen, although its expression has also been documented in osteoblasts (Gu et al., 2014; Yang et al., 2014; Kawasaki et al., 2021). The zebrafish genome harbors genes encoding two Col10a1 orthologs (col10a1a and col10a1b) and we found both transcripts in SFCs representing distinct steps of maturation (Figure 3B,I; Figure 2—figure supplement 1F,I).”

      Finally, of interest, were there any clear clusters on the UMAP plots (Fig 1 Supp3A) of unassigned identity? Even comment on these and how they were dealt with would be of significant interest to the field, as it is highly unlikely all cell types in the skin have been defined. This dataset promises to be a critical reference for defining these in the future.

      Thanks for raising this issue. We provide a new figure (Figure 1 – supplement 4) displaying the unsupervised clustering of the wild-type dataset and a new table (Supplementary file 2 – table 5) with gene expression information for these clusters.

      Minor clarification:

      Fig 2E top. The authors interpret that fate-mapped SFCs at the posterior margin are progressively displaced towards the scale focus. This is confusing as the margin SFC in Fig 2E seems to show them staying largely at the margin. Please clarify if this is what you meant.

      In Figure 2E, a new row of newly differentiated, non-photoconverted SFC were added, displacing the existing row of cells towards the scale focus. Since these cells are all very thin, the net displacement was not as dramatic as the displacement found for sub-marginal SFCs. This point has been clarified in the figure legend in the revised manuscript. This figure legend now reads:

      “Figure 2. Postembryonic skin cell lineage relationships are not reflected in UMAP space. (A) UMAP visualization showing distribution of differentiated SFC expressing sp7 and pre-SFC progenitors expressing runx2b. (B) In-situ hybridization of sp7 and runx2b shows that a halo of pre-SFC progenitors surround the growing scale (arrows). (C) sp7:nEOS expressing differentiated SFC (magenta), were labelled by photoconversion on Day 1. Over the following two days, newly differentiated, un-photoconverted SFC appeared at the scale margin (arrows; n = 5 fish). (D) Schematic representation of differentiated SFC (purple) and the associated halo of pre-SFC (blue). (E) Photoconversion of small groups of SFC in the scale margin and sub-margin; and single-cell photoconversion of focus SFCs (arrows) showed that SFC are progressively displaced toward the scale focus and that SFC in all these regions are capable of cell division (arrows, n ≥ 4 fish for each region tested). Margin SFCs were displaced towards the posterior by newly differentiated, un-photoconverted SFCs (arrowheads). (F) SFCs in UMAP space colored by “pseudotime” rooted in the SFCs. (G) SFCs in UMAP space colored by the ratio of a mesenchymal (migratory) signature to an epithelial signature (Supplementary file 2—Table 3). (H) Schematic representation of epidermis with major substrata. (I) UMAP visualization of wild-type epidermis, subclustered independently of other cell types and displaying expression of the epidermal basal cell marker tp63 (blue) and the periderm marker krt4 (red). Scale bars, 50 μm (B,C,E); 25 μm, (C, lower). (J) The fraction of cells from panel H that pass a minimum threshold for expression of tp63, krt4 or both genes. .”

      References

      Debiais-Thibaud M, Oulion S, Bourrat F, Laurenti P, Casane D, Borday-Birraux V. 2011. The homology of odontodes in gnathostomes: insights from Dlx gene expression in the dogfish, Scyliorhinus canicula. BMC Evolutionary Biology 11:307. doi: 10.1186/1471-2148-11-307,

      Pemberton TJ, Li FY, Oka S, Mendoza-Fandino GA, Hsu YH, Bringas P, Jr., Chai Y, Snead ML, Mehrian-Shai R, Patel PI. 2007. Identification of novel genes expressed during mouse tooth development by microarray gene expression analysis. Dev Dyn 236:2245-57. doi: 10.1002/dvdy.21226, PMID: 17626284

      Woodruff ED, Kircher BK, Armfield BA, Levy JK, Bloch JI, Cohn MJ. 2022. Domestic cat embryos reveal unique transcriptomes of developing incisor, canine, and premolar teeth. Journal of Experimental Zoology Part B: Molecular and Developmental Evolution 338:516-31. doi: https://doi.org/10.1002/jez.b.23168

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study used a battery of cutting-edge technologies including whole exosome sequencing, knockout/knockdown animal models and comparative proteomics to define the physiological roles of ZMYND21 in the regulation of sperm flagellar development and male fertility. The data supporting the conclusion are solid, although inclusion of more patients and ultrastructural studies would have further strengthened the study. This work will be of interest to clinicians and researchers who work on male fertility, but also those working on organs/systems containing motile cilia (e.g., trachea, oviduct, ventricular ependymal cells).

      We thank the eLife editorial board for these very positive comments.

      The MMAF sperm phenotype is rare and, as for all rare diseases, the number of affected patients remains low. Moreover, the most prevalent genes have already been identified. In such case, the identification of four unrelated patients with pathogenic mutations in the same new gene is thus significant, especially as compared to most studies on the same phenotype. We agree that ultrastructural studies could provide valuable information. However, the amount of sperm cells available did not allow us to consider such experiments at this time. The production and study of the Trypanosoma enabled us to overcome these limitations.

      Reviewer #1 (Public Review):

      The goal of the authors is to use whole-exome sequencing to identify genomic factors contributing to asthenoteratozoospermia and male infertility. Using whole-exome sequencing, they discovered homozygous ZMYND12 variants in four unrelated patients. They examined the localization of key sperm tail components in sperm from the patients. To validate the findings, they knocked down the ortholog in Trypanosoma brucei. They further dissected the complex using coimmunoprecipitation and comparative proteomics with samples from Trypanosoma and Ttc29 KO mice. They concluded that ZMYND12 is a new asthenoteratozoospermia-associated gene, biallelic variants of which cause severe flagellum malformations and primary male infertility.

      The major strengths are that the authors used the cutting-edge technique, whole-exome sequencing, to identify genes associated with male infertility, and used a new model organism, Trypanosoma brucei to validate the findings; together with other high-throughput tools, including comparative proteomics to dissect the protein complex essential for normal sperm formation/function. The major weakness is that limited samples could be collected from the patients for further characterization by other approaches, including western blotting and TEM. In general, the authors achieved their goal and the conclusion is supported by their results. The findings not only provide another genetic marker for the diagnosis of asthenoteratozoospermia but also enrich the knowledge in cilia/flagella.

      We thank the reviewer for these positive comments that are helpful for improving our paper. Concerning the remark about the low amount of sperm cells available, most patients allowed us to use excess sperm samples not used for ART treatment but are generally reluctant to perform a new sperm collection. Therefore, we often have to prioritize the most relevant and suitable experiments with the amount of sperm cells available.

      Reviewer #2 (Public Review):

      The manuscript "Novel axonemal protein ZMYND12 interacts with TTC29 and DNAH1, and is required for male fertility and flagellum function" by Dacheux et al. interestingly reported homozygous deleterious variants of ZMYND12 in four unrelated men with asthenoteratozoospermia. Based on the immunofluorescence assays in human sperm cells, it was shown that ZMYND12 deficiency altered the localization of DNAH1, DNALI1, WDR66 and TTC29 (four of the known key proteins involved in sperm flagellar formation). Trypanosoma brucei and mouse models were further employed for mechanistic studies, which revealed that ZMYND12 is part of the same axonemal complex as TTC29 and DNAH1. Their findings are solid, and this manuscript will be very informative for clinicians and basic researchers in the field of human infertility.

      We thank the reviewer for these positive comments that are helpful for improving our paper.

      Reviewer #3 (Public Review):

      In this study, the authors identified homozygous ZMYND12 variants in four unrelated patients. In sperm cells from these individuals, immunofluorescence revealed altered localization of DNAH1, DNALI1, WDR66, and TTC29. Axonemal localization of ZMYND12 ortholog TbTAX-1 was confirmed using the Trypanosoma brucei model. RNAi knock-down of TbTAX-1 dramatically affected flagellar motility, with a phenotype similar to ZMYND12-variant-bearing human sperm. Co-immunoprecipitation and ultrastructure expansion microscopy in T. brucei revealed TbTAX-1 to form a complex with TTC29. Comparative proteomics with samples from Trypanosoma and Ttc29 KO mice identified a third member of this complex: DNAH1. The data presented revealed that ZMYND12 is part of the same axonemal complex as TTC29 and DNAH1, which is critical for flagellum function and assembly in humans, and Trypanosoma. The manuscript is informative for the clinical and basic research in the field of spermatogenesis and male infertility.

      We thank the reviewer for these positive comments that are helpful for improving our paper.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript was very well written, and very easy to follow. Most data were presented in high quality. I only have a few minor issues with some figures.

      1. The signals in some IF images (Fig 1E, Fig. 2B are too weak;

      The figures were improved and modified accordingly.

      1. In some IF images, strong dot-like signals are observed (Fig. 1B, Fig. 2D, Fig. 2F). Are they specific signals or non-specific? Please specify. If they are non-specific, please replace these images.

      These figures were improved and modified accordingly. Indeed, the dot-like signals were non-specific.

      Reviewer #2 (Recommendations For The Authors):

      Here further revisions are suggested.

      1) Description of ZMYND12 genotypes of the patients and the sperm cell samples:<br /> In the title of Table 1, it is suggested to mention "homozygous" for ZMYND12 variants in the patients, since the heterozygous carriers should be unaffected.

      It was done as suggested

      In the Abstract ("with a phenotype similar to ZMYND12-variant-bearing human sperm"), it is suggested to use "with a phenotype similar to the sperm from men bearing homozygous ZMYND12 variants", since the sperm phenotypes are dependent on the biallelic genotypes of human individuals (not the monoallelic genotype of the sperm cells). Please check the whole manuscript and revise the similar points.

      It was done as suggested

      2) The database accession number for ZMYND12:<br /> There are three different numbers (NM_032257.5 vs NM_032257 vs ENSTxxxx) on Page 5 and Figure 1B. Please use NM_032257.5 for consistency.

      It was done as suggested

      3) For the exonic deletion variant, is it possible to predict the coding consequence of ZMYND12 protein?

      No serious and reliable in silico prediction could be perform due to the absence of the exact breakpoints of the exon deletion. mRNA (or WB) studies could precise this point, however no additional sperm samples from this patient was available.

      4) Please italicize the gene symbols. For example, TTC29 on Page 8 and Figure S4, Ttc29-/- KO on Page 13.

      It was done as suggested

      5) In Figure 2, there are too many panels that cannot be merged into one page. Some of the data can be shown as supplemental data.

      We modified the figure 2 as suggested. The new figure 2 now includes only four panels (A, B, C and D) and we added a new figure S4 with the two remaining panels. We modified the text, figure legends and numeration accordingly.

      6) Some of the references are duplicated. Please delete one of them.<br /> For example, there are two Broadhead et al., two Coutton et al. (Nat Commun), and two Dacheux et al.

      Sorry for the duplicates. It was corrected

      7) The information on some references is incomplete (missing volume and/or page numbers). For example, Touré et al and Wang et al. (2010).

      It was corrected

      Reviewer #3 (Recommendations For The Authors):

      However, I have several points as the following:

      1. The sperm concentrations of ZMYND12_3 in patient 3 and patient 4 are significantly different from the other two patients. Do you think it is just due to phenotype heterogeneity?

      We have no formal explanations about these observations but we think that such difference in sperm concentration are more likely due to patient heterogeneity.

      1. There is no record for detailed semen parameters of ZMYND12_ 4, and readers cannot see that the proportion of short flagella in Table 1 is 70%. Please provide complete semen routine information for this case.

      Unfortunately, no additional information about the semen parameters of this patient are available at this time.

      1. In this study, no immunostaining for DNAH1, DNALI1, or WDR66 was detected in sperm from individual ZMYND12_3, and subsequent validation found that TTC29 interacted with ZMYND12 in Trypanosoma brucei. DNAH1 and DNALI1 both interact with TTC29 in mice. The author concluded that ZMYND12 is part of the same axonemal complex as TTC29 and DNAH1 and plays a critical role in flagellum function and assembly. If it is possible, the author can add an experiment on the interaction between ZMYND12 and DNAH1 to make this theory more complete.

      Our study focuses on characterizing protein-protein interactions using IPs (Immunoprecipitations). We were able to demonstrate that the protein ZYMIND12, along with TTC29, DNAH1, and DNALI1, belongs to the same complex, IAD-4. However, this technique does not allow us to draw conclusions about direct interactions for any of the identified proteins.

      Our Co-IP results in T.brucei indicate that the orthologue of DNAH1 (Tb927.11.8160 orthologs) and TTC29 co-immunoprecipitate with TAX-1 (ZYMIND12), thereby complementing the study conducted in Chlamydomonas by Yamamoto et al., 2008. As suggested by reviewer 3, direct interactions between each protein could provide valuable insights into the organization of the intracomplex protein interactome. This aspect will be addressed in a separate study, as it requires the use of direct interaction techniques such as Y2H (Yeast Two-Hybrid) or DuoLink.

      1. Please check the reference section. Some references have duplication, and the content of the literature also needs to be standardized. For example,

      Broadhead R., Dawe HR, Farr H, Griffiths S, Hart SR, Portman N, Shaw MK, Ginger ML, Gaskell SJ, McKean PG, Gull K. 2006. Flagellar motility is required for the viability of the bloodstream trypanosome. Nature 440:224-7.

      Broadhead Richard, Dawe HR, Farr H, Griffiths S, Hart SR, Portman N, Shaw MK, Ginger ML, Gaskell SJ, McKean PG, Gull K. 2006. Flagellar motility is required for the viability of the bloodstream trypanosome. Nature 440:224-227. doi:10.1038/nature04541

      Ersfeld K, Gull K. 2001a. Targeting of cytoskeletal proteins to the flagellum of Trypanosoma brucei. J Cell Sci 114:141-148.

      Ersfeld K, Gull K. 2001b. Targeting of cytoskeletal proteins to the flagellum of Trypanosoma brucei. J Cell Sci 114:141-148. doi:10.1242/jcs.114.1.141

      Sorry for the duplicates, it was corrected.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Combined Public Review:

      It has been shown previously that maternal aging in mice is associated with an increase in accumulation of damaged mitochondria and activation of parkin-mediated autophagy (see DOI: 10.1080/15548627.2021.1946739). It has also been shown that C-natriuretic peptide (CNP) regulates oocyte meiotic arrest and that its use during in vitro oocyte maturation can improve parameters associated with decreased oocyte quality. Here the authors tested whether use of CNP treatment in vivo could improve oocyte quality and fertility of aged mice, for which they provided convincing evidence. They also attempted to determine how CNP improves oocyte developmental competence. They showed a correlation between CNP use in vivo and the appearance (and some functional qualities) of cytoplasmic organelles more closely approximating those of oocytes from young mice. However, this correlation could not be interpreted to imply causation. Additional experiments performed using CNP during in vitro maturation were not properly controlled and so are not possible to interpret.

      A strength of the manuscript is that the authors use an in vivo treatment to improve oocyte quality rather than just using CNP during oocyte maturation in vitro as has been done previously. This strategy provides more potential for improving oocyte quality - over the course of oocyte growth and maturation - rather than just the final few hours of maturation alone. This strategy also has the potential to be translated into a more generally useful clinical therapeutic method that using CNP during in vitro maturation. However, it is difficult to glean information regarding how CNP might have its effects in vivo. A range of models are used in the manuscript with a mix of in vivo studies with in vitro experiments, which results in some disconnect between systemic CNP and its reported intrafollicular action as well as in the short-term versus longer-term actions of CNP on oocyte quality. Specifically, CNP was shown to be reduced in the plasma of aged mice, but this was not shown in the granulosa cells, which are the reported source of CNP that acts on oocytes. Whether the ovarian source of CNP is reduced in aged females was not demonstrated, and CNP is not known to act on oocytes through an endocrine effect. In vivo treatments with CNP by i.p. injection were performed, but the dose (120 ug/kg) and time (14 days) of treatment were not validated by any prior experiments to give them physiological relevance.

      Thank you for the summary and for highlighting our manuscript’s strengths and weaknesses.

      Weaknesses:

      1. There are errors in the manuscript writing that make the Results difficult to follow. Reference to the Figures in the Results section does not match what is shown in the Figure panels. For example, the Results text reports differences in CNP levels in aged and young mice shown in Figure 1C, but the relevant panel is actually shown in Figure 1F. Other Figures have the same problem.

      Thanks for the valuable suggestion. All the mistakes have been corrected in the revised manuscript.

      1. The Results section is not always clear regarding what CNP treatment was done - in vivo injections or in vitro maturation. For example, what is the difference, if any, between Figures 2C-D and Figures S2A-B?

      Thank you for pointing out the potential confusion regarding the experimental procedures in Figures 2C-D and Figures S2A-B. In the revised manuscript, we have included additional explanations to clarify that Figures 2C-D represent in vivo injections, while Figures S2A-B depict in vitro maturation. In brief, the results presented in the Supplementary Material (Figures S1-S7) are derived from in vitro CNP treatment.

      1. Immature oocytes from aged females (~1 year) were treated with a two-step culture system with a pre-IVM step with CNP. Controls included oocytes from young (6-8 weeks) females or oocytes from aged females treated by conventional IVM. The description of these methods suggests that control oocytes did not receive an equivalent pre-IVM culture, hence the relevance of comparisons of CNP-treated versus control oocyte is questionable. It was observed that aged oocytes pre-cultured in CNP improved polar body extrusion rates and meiotic spindle morphology compared to oocytes in conventional IVM, as has been well established. The description of statistical methods does not make clear whether the PBE rate in CNP-treated old oocytes remained significantly lower than young controls.

      Statistical analyses were performed using GraphPad Prism 8.00 software (GraphPad, CA, United States). Differences between two groups were assessed using the t-test. Indeed, CNP is unlikely to fully restore the PB1 rate in aged mice to the same level as in the young group. PB1 rate in CNP-treated aged oocytes remained significantly lower than young controls (P<0.05).

      1. The main effect of the CNP 2-week treatment appears to be increasing the number of follicles that grow into secondary and antral stages, but there is no attempt made to discover the mechanism by which this occurs and therefore to understand why there might be an increase in the number of ovulated eggs, quality of the eggs, and litter size. It is also not clear how an intraperitoneal injection can guarantee its effectiveness because the half-life of CNP is very short, only a few minutes.

      The 2-week treatment of CNP had a significant impact, leading to an increase in the number of follicles progressing to secondary and antral stages, as well as an increase in the number of ovulated eggs, improved egg quality, and enhanced litter size. Previous studies (references: 10.1530/REP-18-0470; 10.1210/me.2012-1027) have demonstrated the crucial role of CNP as an upstream regulator in stimulating preantral follicle growth and promoting the ovulation rate. These studies have also identified the influence of CNP on the expression of key ovarian genes involved in cell growth and steroidogenic enzymes. Consistent with these findings, our study provides further evidence supporting CNP as a critical regulator of preantral follicle growth and oocyte quality. Furthermore, it is important to note that oocyte-derived paracrine factors play essential roles in follicular development. CNP may regulate the communication between oocytes and somatic cells, contributing to folliculogenesis and follicular development. We are considering this aspect for further investigation in another ongoing study.

      To ensure the effectiveness of CNP, given its short half-life (a few minutes), aged mice (58 weeks old) received daily intraperitoneal injections of CNP (120 μg/kg body weight; Cat#B5441, ApexBio) for a duration of 14 days.

      1. Meiotic spindle morphology, as well as a number of putative markers of cytoplasmic maturation are also suggested to be improved after pre-culture with CNP. In each case a subjective interpretation of "normal" morphology of these markers is derived from observations of the young controls and the proportions of oocytes with normal or abnormal appearance is evaluated. However, parameters that define abnormal patterns of these markers appear to be subjective judgements, and whether these morphological patterns can be mechanistically attributed to the differences in developmental potential cannot be concluded.

      Oocyte cytoplasmic maturation involves a remarkable reorganization of the oocyte cytoplasm, encompassing the movement of vesicles, mitochondria, Golgi apparatus, and endoplasmic reticulum. This dynamic process occurs during the transitions from the germinal vesicle breakdown (GVBD) stage to the metaphase I (MI), polar body extrusion (PBE), and metaphase II (MII) stages (reference: 10.1093/humupd/dmx040). In our study, we observed that CNP treatment partially rescued cytoplasmic maturation events in aged oocytes by maintaining normal distribution patterns of cortical granules (CG), endoplasmic reticulum (ER), and Golgi apparatus. However, further experiments are needed to investigate the specific action of CNP on the function of CG, ER, and Golgi apparatus. These experiments are beyond the scope of this manuscript, but we acknowledge the importance of this aspect and will consider it for future research. In this study, our main focus was to examine the effects of CNP on mitochondria distribution and function. Therefore, we analyzed the localization patterns of mitochondria, mitochondrial membrane potential, oocyte ATP content, and ROS levels. These experiments were aimed at elucidating the impact of CNP on mitochondrial dynamics and metabolism, which are crucial for oocyte quality and development.

      1. In addition to the localization patterns of mitochondria, the mitochondrial membrane potential, oocyte ATP content and ROS levels were assessed through more objective quantitative methods. These are well known to be defective in oocytes of aged females and CNP treatment improved these measures. Mitochondrial dysfunction is the most obvious link between oocyte apoptosis, autophagy, cytoplasmic organelle miss-localization and aberrant spindle morphology. Among the most intriguing results is the finding that CNP mediated a cAMP-dependent protein kinase (PKA) dependent reduction in mitochondrial autophagy mediators PINK and Parkin and reduced the recruitment of Parkin to mitochondria in oocytes. However, it may not be possible to directly link this observation to the improvements in IVM oocyte quality, since PINK/Parkin assessments were performed in oocytes from cultured follicles treated with CNP for 6 days.

      The beneficial effects of CNP on oocyte quality have been extensively demonstrated through in vivo experiments (Figure 1 and 4) and “two-step” in vitro culture experiments (Figure S1 and S7). In this study, our primary focus is to analyze the signaling pathway and mechanism by which CNP inhibits mitophagy in oocytes. Previous studies have highlighted the significant role of cAMP-PKA activity in reducing mitochondrial recruitment of Parkin and mitophagy (reference: 10.1038/s42003-020-01311-7). Consistent with these findings, our study revealed that aged oocytes exhibited lower concentrations of cAMP compared to young oocytes. However, upon administration of CNP, we observed a substantial increase in intraoocyte cAMP levels. To investigate the involvement of PKA in CNP-mediated oocyte mitophagy, we conducted further experiments. We isolated preantral follicles (80-100 µm diameter) from the ovaries of aged mice and subjected them to in vitro culture with either 100 nM CNP or a combination of 100 nM CNP and 10 µM H89, a PKA inhibitor. Monitoring the growth dynamics of the follicles revealed that treatment with 100 nM CNP significantly increased follicle diameter, while H89 treatment inhibited the promotive effect of CNP on preantral follicle growth (Figure 6 K and L). Western blot analysis demonstrated that CNP supplementation led to a significant decrease in PINK1 and Parkin expression levels, which were abrogated by H89 treatment (Figure 6 M-O). It is well-established that the cAMP-PKA pathway plays a crucial role in inhibiting Parkin recruitment to damaged mitochondria (Akabane et al., 2016). Therefore, we aimed to investigate whether PKA inhibition regulates Parkin recruitment. To assess the effects of CNP on mitochondria, we performed double staining for Parkin and translocase of outer mitochondrial membrane 20 (TOMM20). The results clearly demonstrated that CNP inhibited the mitochondrial localization of Parkin, while PKA inhibition with H89 led to Parkin translocation to mitochondria, as indicated by the overlap of the two staining signals (Figure 6 P and Q). Collectively, our data suggest that the suppression of Parkin recruitment through the cAMP-PKA axis represents an important mechanism underlying the protective effect of CNP against oxidative injury in maternally aged mouse oocytes.

      1. The gold standard assay for oocyte quality is embryo transfer and live birth. The authors assessed the impact of maturing oocytes in vitro in the presence of CNP on oocyte quality by less robust assays (e.g., preimplantation embryo development in vitro), so the impact on oocyte quality is less certain.

      We appreciate the Revierer’s suggestion to assay live birth rates by transfer embryos obtained from IVM oocytes. However, we decided not to pursue this option for this revision because of the current technical challenges that make it difficult to get a precise result of live birth rates from IVM oocyte. Thank you for your very valuable suggestion, we have discovered the shortcomings in my current work, and I will follow your suggestions in my future work to improve the level of scientific research and achieve more results.

      1. The terminology used to describe many of the Results exaggerates the findings. For example, the authors claim that many of their immunofluorescent markers of the various organelles have a pattern that is "restored" by CNP. However, in most cases the pattern is "improved" toward the control condition but is not fully restored.

      We acknowledge the confusion caused by the wording of the mechanism of action of CNP in the original version. In the resubmission, we have made significant improvements by providing critical information that clarifies the action of CNP. We believe that these revisions will enhance the understanding of the mechanism of CNP and its implications. Thank you for pointing out this issue, and we appreciate your feedback in helping us improve the clarity of our work.

      1. The numbers of embryos should have been corrected for the number of eggs fertilized as a starting point so that the percentage that developed to each stage could be expressed as a percentage of successfully fertilized eggs rather than overall percentages. As currently shown in the Figures and described in the Legend, there is no information regarding what the percentage on the y-axis means. For example, does Figure 4B show the number of 2C embryos divided by the number of eggs inseminated? Or is it divided by the number of successfully fertilized eggs, and if so, how was that assessed?

      The embryonic development rates (Figure 4 B-F) were calculated based on the total number of oocytes, and the percentages of oocytes that developed to each stage were expressed as overall percentages.

      1. When fewer eggs are fertilized, the numbers of embryos per group are lower and so the impact of culturing multiple embryos together is lost. As a result, it is possible that culture conditions rather than oocyte quality drove the differences in the numbers of embryos that achieved each stage of development.

      The embryonic development rate was calculated based on the total number of oocytes. Each group included a minimum of 50 oocytes with three replicates (Young: 51, aged: 53, CNP+aged: 50). The embryo culture conditions were consistent across all groups.

      1. Not all claims in the Discussion are supported by the evidence provided. For example, "In addition, the findings demonstrated that CNP improved cytoplasmic maturation events by maintaining normal CG, ER and Golgi apparatus distribution and function in aged oocytes" but it was never demonstrated that the altered distribution had any functional impact.

      Oocyte cytoplasmic maturation involves a remarkable reorganization of the oocyte cytoplasm, including the movement of vesicles, mitochondria, Golgi apparatus, and endoplasmic reticulum. Extensive remodeling and repositioning of intracellular organelles occur during the transitions from GVBD to MI, PBE, and MII stages (10.1093/humupd/dmx040). Our findings indicate that CNP partially rescued cytoplasmic maturation events in aged oocytes by preserving normal distribution of CG, ER, and Golgi apparatus, as well as maintaining mitochondrial function. We acknowledge the importance of considering the impact of CNP on the function of CG, ER, and Golgi apparatus for future research. In summary, these findings demonstrate that CNP improves cytoplasmic maturation events in aged oocytes by facilitating the reorganization of CG, ER, and Golgi apparatus.

      1. Incompleteness and errors in the Methods section reduce confidence in many of the results reported.

      We will enhance the readability of the entire Methods section for the resubmission.

      1. The methods used for Statistical Analysis are never explained in either the Methods or the Figure legends. It is unclear whether appropriate analyses were done, and it is frequently unclear what was the sample size and how many times a particular experiment was repeated. These weaknesses detract from confidence in the data.

      Statistical analyses were performed using GraphPad Prism 8.00 software (GraphPad, CA, United States). Differences between two groups were assessed using the t-test. Data were reported as means ± SEM. Results of statistically significant differences were denoted by asterisk. (P < 0.05 denoted by , P < 0.01 denoted by , P < 0.001 denoted by , and P < 0.0001 denoted by **).

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      1. The introduction does not provide critical information regarding what is already known about the mechanism of action of CNP, what other tissues are impacted by CNP treatment, and how it might affect oocyte growth. Providing this information would make it much easier to understand what is novel about the current manuscript.

      We acknowledge that the mechanism of action of CNP was unclear in the original version. We have now included essential information to clarify the action of CNP.

      1. Comparison of the RNAseq dataset to robust datasets from young vs aged mice would strengthen the analysis (e.g., the dataset in DOI: 10.1111/acel.13482).

      Thank you for your professional suggestion. According to the suggestion from you, we will make comparison of the RNAseq dataset to robust datasets from young vs aged mice in my future work .

      1. Please explain what is "Dr. Tom" that was used for RNA sequencing analysis, in the Methods.

      Dr. Tom is a web-based solution that offers convenient analysis, visualization, and interpretation of various types of RNA data, including mRNA, miRNA, and lncRNA. It also supports the interpretation of single-cell RNA-seq data and WGBS data. Developed by a team of expert scientists and bioinformaticians at BGI, who have extensive experience in numerous research projects, Dr. Tom provides a wide range of intuitive and interactive data visualization tools tailored to save time in conducting differential expression or pathway analysis research. Moreover, its powerful analysis tools and advanced algorithms enable users to extract new insights and derive additional value from their data beyond what is available through standard RNA analysis services. The integration of data from leading databases worldwide allows users to reference and cross-check their results and findings. Dr. Tom is already trusted by tens of thousands of scientists and researchers, serving as a valuable and essential tool alongside their own internal data curation and analysis efforts. To learn more, please visit: Dr. Tom website https://www.bgi.com/global/service/dr-tom.

      1. The Results state that single-cell transcriptomics was performed, but the Methods state that 5 oocytes were collected from each mouse. The actual Method used should be clarified.

      Single-cell RNA-seq is a powerful technique that enables digital transcriptome analysis at the single-cell level using deep-sequencing methods. With this approach, even a single cell can be isolated and processed through various steps to generate sequencing libraries. Given the limited availability of oocyte samples, we employed a single-cell RNA-seq library construction protocol, allowing us to analyze the transcriptomes of individual oocytes. As a result, we collected and analyzed five oocytes from each mouse in our study.

      1. The raw RNAseq data should be deposited into a publicly accessible database and reported by an accession number. It is not sufficient to state that the data is included in the manuscript and supporting information.

      The RNA-seq data has been submitted as supporting information and is now accessible to all readers.

      1. The image in Figure 1G is not very clear.

      Thank you for bringing this to our attention. We will enhance the readability of all our figures for the resubmission.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors report a study, where they have sequenced whole genomes of four individuals of an extinct species of butterfly from western North America (Glaucopsyche xerces), along with seven genomes of a closely related species (Glaucopsyche lygdamus), mainly from museum specimens, several to many decades old. They then compare these fragmented genomes to a high-quality, chromosome-level assembly of a genome of a European species in the same genus (Glaucopsyche alexis). They find that the extinct species shows clear signs of declining population sizes since the last glacial period and an increase in inbreeding, perhaps exacerbating the low viability of the populations and contributing to the extinction of the species.

      The study really highlights how museum specimens can be used to understand the genetic variability of populations and species in the past, up to a century or more ago. This is an incredibly valuable tool, and can potentially help us to quickly identify whether current populations of rare and declining species are in danger due to inbreeding, or whether at least their genetic integrity is in good condition and other factors need to be prioritised in their conservation. In the case of extinct species, sequencing museum specimens is really our only window into the dynamics of genomic variability prior to extinction, and such information can help us understand how genetic variation is related to extinction.

      I think the authors have achieved their goal admirably, they have used a careful approach to mapping their genomic reads to a related species with a high-quality genome assembly. They might miss out on some interesting genetic information in the unmapped reads, but by and large, they have captured the essential information on genetic variability within their mapped reads. Their conclusions on the lower genetic variability in the extinct species are sound, and they convincingly show that Glaucopyche xerces is a separate species to Glaucopsyche lygdamus (this has been debated in the past).

      We thank the reviewer for his/her positive assessment and we hope to have contributed to both the knowledge of this iconic extinct species and also the possibility of applying our observations to other, endangered insects.

      Reviewer #2 (Public Review):

      The Xerces Blue is an iconic species, now extinct, that is a symbol for invertebrate conservation. Using genomic sequencing of century-old specimens of the Xerces Blue and its closest living relatives, the authors hypothesize about possible genetic indicators of the species' demise. Although the limited range and habitat destruction are the most likely culprits, it is possible that some natural reasons have been brewing to bring this species closer to extinction.

      The importance of this study is in its generality and applicability to any other invertebrate species. The authors find that low effective population size, high inbreeding (for tens of thousands of years), and higher fraction of deleterious alleles characterize the Xerces colonies prior to extinction. These signatures can be captured from comparative genomic analysis of any target species to evaluate its population health.

      It should be noted that it remains unclear if these genomic signatures are indeed predictive of extinction, or populations can bounce back given certain conditions and increase their genetic diversity somehow.

      Methods are detailed and explained well, and the study could be replicated. I think this is a solid piece of work. Interested researchers can apply these methods to their chosen species and eventually, we will assemble datasets to study extinction process in many species to learn some general rules.

      We thank the reviewer for his/her observations and suggestions for improvement and we agree that endangered species show conflicting signals sometimes associated to decreasing genetic diversity (some species are very low in numbers and yet they keep reasonably high diversity levels as compare to others); however, this aspect remains to be explored in detail in insects that have demographic dynamics to a large extent impossible to compare to those observed in vertebrates. We agree there is a full range of cases and circumstances in declining insects to be explored in the future.

      Several small questions/suggestions:

      1) The authors reference a study concluding that Shijimiaeoides is Glaucopsyche. Their tree shows the same, confirming previous publications. And yet they still use Shijimiaeoides, which is confusing. Why not use Glaucopsyche for all these blues?

      We have decided, for the sake of clarity, to change it to Glaucopsyche divina in Figure 1, as suggested by the reviewer.

      2) Plebejus argus is a species much more distant from P. melissa than Plebejus anna (anna and melissa are really very close to each other), and yet their tree shows the opposite. What is the problem? Misidentification? Errors in phylogenetic analyses?

      The reviewer is right and we think there is a mixture of potential problems here that deserve a more in depth analysis of this genus. We used MN974526 as a proxy for P. argus and we suspect now this is probably a case of misidentification (but we cannot verify it without a morphological examination of the original specimen and likely additional genomic data). MN974526 shows a 99.33% identity to the sequence by Vila et al. (2011) code NGK02C411, defined as P. melissa; as the true status of this mitogenome cannot be totally clarified (it is likely that it is in fact P. idas), we have decided to attribute it to “Plebejus sp” in the Figure 1 and explained this in the text.

      3) Wouldn't it be nicer to show the underside of butterfly pictures that reveals the differences between xerces and others? Now, they all look blue and like one species, no real difference.

      This is a good suggestion, and we have now included the underside of different species, including Xerces Blue.

      4) The authors stated that one of five xerces specimens failed to sequence, and yet they show 5 specimens in the tree. Was the extra specimen taken from GenBank?

      Yes, the extra specimen is the one reported in Grewe et al. 2021; we have marked in Figure 1 with an * this specific mitogenome (and mentioned in the legend), which clusters nicely within the set of Xerces Blue mtDNA diversity we have generated.

      Reviewer #1 (Recommendations For The Authors):

      I am curious why the authors did not attempt to do a de novo assembly of the extinct species' genomes. In our work on museum specimen genomes, we have successfully used a de novo approach to extract protein coding genes from such highly fragmented genomes. We used SPAdes to assemble the museum genomes and then assessed BUSCO completeness, finding anything from 50% to 90% BUSCO completeness. The genome assemblies themselves are pretty poor with N50s around a few thousand bp at best, but the information we can extract from such highly fragmented genomes is very useful, especially with regard to protein coding gene exons. Perhaps worth trying?

      Thanks for the comment. In our approach, and considering the expected low quality from some museum specimens in the lower part of the conservation spectrum, we used the standard approach based on the variant calling of short read data mapped to a close assembly. This method has been shown to be precise enough in cross species mapping (Kuderna et al. Science 2023). Local assemblies of exons and genes, while potentially informative, particularly for structural preservation, was not the priority in our objectives where only the base pair mutations were explored. Nevertheless, we are planning to generate in the near future an assembly for the closest living relative of Xerces, Glaucopsyche lygdamus, and once we get it, we will consider the possibility of undertaking the suggested approach with this new reference to explore the genomic architecture of Xerces Blue in more detail.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 Public Review

      “First, I agree with the authors of this manuscript that conformational changes in the XFEL structures with 2.8 A resolution are not reliable enough for demonstrating the subtle changes in the electron transfer events in this bacterial photosynthesis system. Actually, the data statistics in the paper by Dods et al. showed that the high-resolution range of some of the XFEL datasets may include pretty high noise (low CC1/2 and high Rsplit) so the comparison of the subtle conformational changes of the structures is problematic.

      The manuscript by Gai Nishikawa investigated time-dependent changes in the energetics of the electron transfer pathway based on the structures by Dods et al. by calculating redox potential of the active and inactive branches in the structures and found no clear link between the time-dependent structural changes and the electron transfer events in the XFEL structures published by Dods, R.et al. (2021). This study provided validation for the interpretation of the structures of those electrontransferring proteins.

      The paper was well prepared.”

      Thank you very much for your positive and insightful comment. We greatly appreciate your suggestion regarding the high noise levels of the XFEL structures. Including this information in the Introduction section will draw readers’ attention to the concerns about the reliability of these XFEL structures. We have incorporated it into the Introduction section.

      Reviewer #2 Public Review

      “The manuscript by Nishikawa et al. addresses time-dependent changes in the electron transfer energetics in the photosynthetic reaction center from Blastochloris viridis, whose time-dependent structural changes upon light illumination were recently demonstrated by time-resolved serial femtosecond crystallography (SFX) using X-ray free-electron laser (XFEL) (Dods et al., Nature, 2021). Based on the redox potential Em values of bacteriopheophytin in the electron transfer active branch (BL) by solving the linear Poisson-Boltzmann equation, the authors found that Em(HL) values in the charge-separated 5-ps structure obtained by XFEL are not clearly changed, suggesting that the P+HL- state is not stabilized owing to protein reorganization. Furthermore, chlorin ring deformation upon HL- formation, which was expected from their QM/MM calculation, is not recognized in the 5ps XFEL structure. Then the authors concluded that the structural changes in the XFEL structures are not related to the actual time course of charge separation. They argued that their calculated changes in Em and chlorin ring deformations using the XEFL structures may reflect the experimental errors rather than the real structural changes; they mentioned this problem is due to the fact that the XFEL structures were obtained at not high resolutions (mostly at 2.8 Å). I consider that their systematic calculations may suggest a useful theoretical interpretation of the XFEL study. However, the present manuscript insists as a whole negatively that the experimental errors may hamper to provide the actual structural changes relevant to the electron transfer events.”

      Thank you for your feedback on our manuscript. We appreciate your positive assessment of our systematic calculations and theoretical interpretation of the XFEL study. We have carefully considered your comments and made the necessary revisions to address your concerns.

      Reviewer #2 Recommendations for the authors

      “The authors have satisfied my concerns mostly, in particular by providing the Em(QA) changes, which seem to be more attractive in the present form. However, the Em(QA) value(s), at least in the dark structure, should be provided, and the procedure of the calculation for the Em(QA) value(s) should be described in METHODS "Calculation of Em".

      The calculated Em(QA) values for dataset a and dataset b in the dark structure are –223 mV and – 209 mV, respectively, using the reference Em value of –256 mV versus NHE for menaquinone-2 in water [Photosynth. Res. 134 (2017) 193]. These calculated values are comparable to experimentally measured values of –150 mV for PbRC from Blastochloris viridis (naphtoquinone) [Biochim. Biophys. Acta 440 (1976) 622] and –180 mV for PbRC from Rhodobacter sphaeroides (ubiquinone) [Arch. Biochem. Biophys 172 (1976) 329].

      We have now provided this information in the Method (“Calculation of Em”) and Results and Discussion (“Relevance of structural changes observed in XFEL structures”) sections.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript by Neininger-Castro and colleagues presents a novel automatic image analysis method for assessing sarcomeres, the basic units of myofibrils and validates this tool in a couple of experimental approaches that interfere with sarcomere assembly in iPSCcardiomyocytes (iPSC-CM).

      Automatic quantification of sarcomeres is definitely something that is useful to the field. I am surprised that there is no reference in the manuscript to SarcTrack, published by Toepfer and colleagues in 2019 (PMID 30700234), which has exactly the same purpose. The advantage of the image analysis software presented in the current manuscript appears to me to be that it can cover both mature sarcomeres and nascent sarcomeres in premyofibrils effectively.

      We whole-heartedly disagree that SarcTrack has the exact same purpose as sarcApp. sarcApp measures more than the frequency of actinin2 images, and can measure real-space quantifications of actinin, myomesin, and titin, which has not been done before in this way. However, SarcTrack is an interesting method that we hope many researchers find helpful in their research. SarcTrack is a particle tracker that outputs the dimensions of the objects found, but does not distinguish between Z-Lines and other actinin2-positive structures (Z-Bodies, adhesions). It also does not group these structures into higher order structures such as myofibrils and muscle stress fibers.

      When going through the manuscript there were a few issues that should be addressed in a revised version of the manuscript:

      1) I am a bit puzzled that they took 1.4 um length as a cutoff length for a mature A-band in their quantifications, since the consensus in the field for thick filament length seems to be 1.6 um?

      We use 1.4 µm as a cutoff length for the length of a Z-Line rather than the A-Band. We believe the reviewer is referring to the width of the A-Band perpendicular to the Z-lines, which is indeed 1.6 µm. However, we are referring to the length of the Z-Lines, which can span anywhere from 1.4 µm to up to 10 or more µm. Thank you for allowing us to make the clarification.

      2) When doing the knockdown for alpha and beta-myosin heavy chain, respectively, why did they not also do a Western blot for the "other" isoform as well (Figure 7)? We know that iPSCCM express a mixture, so the relatively mild phenotype that they observe in single knockdown experiments may well be due to concomitant upregulation of the expression of the other isoform. In my point of view this should be checked.

      It is likely that in the single knockdown experiments the other isoform is upregulated, which is why we were careful in stating that neither muscle myosin alone is required for sarcomere formation. We do agree this would be an interesting experiment to check beyond the scope of this manuscript.

      3) There seems to be a disconnect between the images for myomesin knockdown shown in Figure 8H and the quantification shown in Figure 8I, which makes me wonder whether the image shown in H middle (MYOM1 (1) KD), where the beta-myosin doublets do not seem to be much affected is really representative?

      The image shown in the middle of H is representative of the mean length of beta-myosin doublets in MYOM1 (1) KD hiCMs. While the beta-myosin doublets are still present and organized, they are significantly shorter. In the zoomed out image, you can appreciate much shorter arrays of beta-myosin doublets that, while extending across the entire cell, are thinner than control cells.

      Reviewer #2 (Public Review):

      Neininger-Castro et al report on their original study entitled "Independent regulation of Z-lines and M-lines during sarcomere assembly in cardiac myocytes revealed by the automatic image analysis software sarcApp", In this study, the research team developed two software, yoU-Net and sarcApp, that provide new binarization and sarcomere quantification methods. The authors further utilized human induced pluripotent stem cell-derived cardiomyocytes (hiCMs) as their model to verify their software by staining multiple sarcomeric components with and without the treatment of Blebbistatin, a known myosin II activity inhibitor. With the treatment of different Blebbistatin concentrations, the morphology of sarcomeric proteins was disturbed. These disrupted sarcomeric structures were further quantified using sarcApp and the quantification data supported the phenotype. The authors further investigated the roles of muscle myosins in sarcomere assembly by knocking down MYH6, MYH7, or MYOM in hiCMs. The knockdown of these genes did not affect Z-line assembly yet the knockdown of MYOM affected M-line assembly. The authors demonstrated that different muscle myosins participate in sarcomere assembly in different manners.

      Reviewer #3 (Public Review):

      Neininger-Castro and colleagues developed software tools for the quantification of sarcomeres and sarcomere-precursor features in immunostained human induced pluripotent stem cellderived cardiac myocytes (hiCMs). In the first part they used a deep-learning- based model called a U-Net to construct and train a network for binarization of immunostained cardiomyocyte images. They also wrote graphical user interface (GUI) software that will assist other labs in using this approach and made it publicly available. They did not compare their approach to existing ones, but an example from one image suggests their binarization tool outperforms Otsu thresholding binarization.

      In the second part they developed a software tool called sarcApp that classifies sarcomere structures in the binarized image as a Z-Line or Z-Body and assigns each to either a myofibril or to stress fibers. The tools can then automatically count and measure multiple features (33 per cell and 24 per myofibril) and report them on a per-cell, per-myofibril, and per- stress fiber basis.

      To test the tools they used Blebbistatin to inhibit sarcomere assembly and showed that the sarcApp tool could capture changes in multiple features such as fewer myofibrils, fewer Z-Lines, decreased myofibril persistence, decreased Z-Line length and altered myofibril orientation in the Blebbistatin treated cells. With some changes the tool was also shown to quantify sarcomeres in titin and myomesin stained cardiomyocytes.

      Finally they used sarcApp to quantify the changes in sarcomere assembly after siRNA mediated knockout of MYH7, MYH7, or MYOM. The analysis indicates that neither MYH6 nor MYH7 knockdown perturbed the assembly of Z- or M-lines, and that knockdown of MYOM perturbed the A-band/M-Line but not the Z-Line assembly according to features captured by the sarcApp tool.

      Overall the authors developed and made publicly available an excellent software tool that will be very useful for labs that are interested in studying sarcomere assembly. Multiple features that are difficult to measure or count manually can be automatically measured by the software quickly and accurately.

      There are however some remaining questions about these tools:

      1) The binarization tool which is tailored to sarcomere image binarization appears promising but was not systematically compared with existing approaches.

      We compared it with the existing approach we used previously in the lab, which was Otsu’s method for binarization. We are not aware of several other binarization approaches to compare to, other than using other machine learning techniques that are less advanced than a U-Net, the current standard in image-to-image translation.

      2) How robust is the tool? The tool was tested on images from one type of cardiomyocytes (hiCMs) taken from one lab using Nikon Spinning Disk confocal microscope equipped with Apo TIRF Oil 100X 1.49 NA objective or instant Structured Illumination Microscopy (iSIM), using deconvolution (Microvolution software) and in a specific magnification. It remains to be seen whether the tool would be equally effective with images taken with other microscopy systems, with other cardiomyocytes (chick or neonatal rat), with different magnifications, live imaging, etc.

      We tested the software with several magnifications, with live imaging, and with other tissues. We did not include the information in the manuscript because the data we tested the software with is for future manuscripts studying different aspects of sarcomere formation and maintenance. sarcApp reliably identifies Z-Lines and sarcomeres with deconvolved widefield fluorescence images of hiCMs and frozen human tissue, and are currently using it to measure zebrafish data for another study. Further, it works for live imaging with an actinin2-GFP (or similar) label. For the titin quantification, we would recommend using only 60-100X magnification, as the titin structures (doublets and rings) are not resolvable at lower magnifications.

      3) The tool was developed for evaluation of sarcomere assembly. The authors show that for this application it can detect the perturbation by Blebbistatin, or knockdown of sarcomeric genes. It remains to be seen if this tool is also useful for assessment of sarcomere structure for other questions beside sarcomere assembly and in other sarcomere pathologies.

      While this is beyond the scope of this specific methods paper, we welcome other researchers to use our software for other questions in other pathologies. We are currently doing the same for other manuscripts from our lab.

      Reviewer #1 (Recommendations For The Authors):

      1)"alpha-actinin..., which border the sarcomeric contractile machinery (thin and thick filaments); Z-lines do NOT border thick filaments in a relaxed sarcomere

      We have removed “(thin and thick filaments)” from the text.

      2) myomesin targeting siRNAs (gene name MYOM): there are actually three genes encoding for myomesin family members, specify, which one was targeted (I am assuming MYOM1).

      Thank you for the clarification: we do target MYOM1

      3) I am not surprised that they found not many mature Z-lines in the absence of both sarcomeric myosins; a similar codependence of assembly of mature Z-discs and the presence of functional thick filaments was previously shown by Geach and colleagues in 2015 (PMID 25845369)

      Thank you for sharing this manuscript: we have added a reference to it in our study.

      Reviewer #2 (Recommendations For The Authors):

      This work offers the possibility to gain more insights into the process of sarcomere assembly through the advancement in sarcomeric or myofibril structure analyses. However, some clarifications are needed from the authors, please see below for the comments.

      1) It is recommended that the authors include the time points for replating and harvesting hiCMs. After replating, the cardiomyocytes require at least three to four days for sarcomeric structures to reform. If the hiCMs were fixed before sarcomere assembly had completed, the staining of sarcomeric proteins including ACTN2 and titin could be compromised and it is difficult to tell if the phenotypes observed were consequences of drug treatments or knockdown of sarcomeric genes or simply because the replating hiCMs were fixed before their sarcomeric structures had fully regrown. It is also recommended that the authors replate hiCMs at a fixed time point to avoid discrepancies in the data.

      Cardiomyocytes do not require three to four days for sarcomeric structures to re-form, and indeed only require 24 hours, with the first sarcomeres typically appearing at ~6 hours. We and others have published several studies demonstrating this (Fenix et al., eLIfe 2018, Taneja, Neininger and Burnette MBoC 2020, Chen et al. Nature Methods, 2022). While sarcomeres continue to develop and turn over after this time, our lab is interested in the beginning steps of sarcomerogenesis rather than the turnover of mature structures.

      2) The sarcApp automatically identifies Z-lines and Z-bodies; however, is there an option for the users to set their own thresholds? Some users may select different criterions when quantifying sarcomeres. Moreover, the Z-lines and Z-bodies identified by the software are not always accurate. Can the users modify the list manually in an unbiased way. If this function is not available, the authors may consider adding this function to their software. sarcApp measures Zline and Z-bodies length but does not measure Z-line and Z-bodies width, but sometimes it is also necessary to measure the width.

      Absolutely, users can modify the thresholds to identify Z-Lines and Z-Bodies. There is not a way for users to modify the list in an unbiased way per se, as editing the list of Z-Lines and Z-Bodies based on non-mathematical measurements is inherently biased, but the user is free to add in other Z-Lines and Z-Bodies as they wish. In this context, “manually” and “unbiased” is mutually exclusive.

      3) It is recommended that the authors include the original images beside the sarcomeric structures identified by sarcApp (Figure 2A, 2C, 4C-F and more). It would be easier to compare the original Z-lines and Z-bodies with those identified by the software.

      We have added these in Author response image 1.

      Author response image 1.

      Uncropped images and merges from Figures 2, 4 and 6, respectively.

      4) The M-line length quantification data in Figure 3G, 5F, and 6H showed different colored-dots labeling n1 to n3, but the authors did not discuss the significance of these symbols.

      We are not sure what the reviewer means by this statement: there is no significance of the different colored dots other than to mark the biological replicate shown. These graphs were created using SuperPlots, which was not stated in the original methods. It has now been added to the Statistical Analysis section.

      5) Can the authors elaborate more on the reasons why they treated Blebbistatin at concentrations of 50µM and 100µM. Previous studies showed that 25µM of Blebbistatin was sufficient to delay the transformation of cardiomyocytes (PMID 27072942). Can the authors also comment on why they selected 6 hours, 12 hours, and 24 hours post replating for drug treatment. Moreover, the drug treatment at different time points was only done on ACTN2 but not titin or myomesin.

      We selected 6, 12, and 24 hours for actinin2 to show the time course of sarcomere formation and to show that sarcomeres are developed by 24 hours, as also mentioned above. We are interested in future studies of the time course of titin and myomesin over time, and are working on it in the lab.

      We chose 50 and 100 µM Blebbistatin as these completely blocked sarcomere assembly whereas treatment with 25 µM did not. This manuscript is a methods paper that aims to validate sarcApp and show how it could be used. We did not intend for it to be a comprehensive study of how different concentrations of blebbistatin affects sarcomere assembly.

      We are also unsure what the reviewer means by “transformation of cardiomyocytes”. The manuscript with the PMID of 27072942 does not address this issue. The paper is a “review and analyze readmission data for patients who received a continuous flow left ventricular assist device (LVAD)”. We assume the reviewer is referring to differentiation. The model system we developed and published in eLife in 2018 does not use differentiating iPSC cardiac myocytes. The hiCMs we use are terminally differentiated but still immature, as they are more transcriptionally similar to primary fetal myocytes. As such, they do not maintain their sarcomeres when they removed from the 96 well and plated onto a glass coverslip for highresolution microscopy. These assemble sarcomeres within 24 hours with the sarcomeres forming close to the dorsal membrane and then rearrange overtime (e.g., moving from the top of the cell to the bottom) (Fenix et al., eLife 2018). With that said, we do agree with the reviewer that a study of sarcomere assembly in the context of cardiac myocyte differentiation would be a fascinating direction for future studies, and we think sarcApp could facilitate such studies.

      6) The authors mentioned that the myofibrils of Z-line, titin, and M-line were randomly oriented after Blebbistatin treatments. The myofibrils were randomly oriented for titin and M-line. However, the orientation of Z-line after 50µM Blebbistatin treatment was not necessarily random, only the orientation after 100µM Blebbistatin treatment was randomized. The authors might consider changing bar graph to other types of charts if the orientation was really randomized after quantification.

      We find that the bar chart is the most informative to us, but users can consider other types of charts in their analyses.

      7) It is recommended that the authors include images staining ACTN2 at lower magnifications (Figure 1A, 1C). With current images, it is true that yoU-Net can separate Z-lines from Z-bodies yet it is difficult to tell if yoU-Net can still distinguish Z-lines from Z-bodies with larger images or it only applies to a small portion of the image.

      The yoU-Net can distinguish Z-Lines from Z-Bodies with images of any size, as image size (height vs. width in pixels) does not affect how binarization occurs. During binarization, the only pixel requirement is that the width and height are divisible by 8 (for downsampling purposes). Usually this is not the case with raw images, so the image borders are slightly cropped to make them usable. In terms of resolution, we recommend using 60X-100X objectives on confocal or superresolution data for the clearest results. We have, however, successfully binarized deconvolved widefield images at 100X as well.

      8) The authors mentioned that the knockdown of MYH7 did not affect Z-lines and M-lines; however, the structures of ACTN2, myomesin, and titin appeared more organized as compared to those in control.

      We agree that the sarcomeres and myofibrils look slightly more organized, and did mean to state that the knockdown did not negatively affect Z-Lines and M-Lines and have updated the manuscript to be more accurate.

      9) Please provide the merge images for Fig. 4D, 4E, 6B

      The merge images for Fig. 4D, 4E, and 6B are included with the original images requested above (point 3)

      10) In the text, they described" "antibodies to the titin I-band localize to both MSFs and sarcomeres in hiCMs (Figure 4A). Titin forms ring-like structures around the Z-Bodies of MSFs that are closer to the apparent sarcomere transition point (Figure 4A)" However, based on the antibody information they provided, it is not explicitly recognized for N-or C-terminus TITIN. Please provide TTN N-terminus or TTN-C terminus co-stainings with ACTN2 antibody to understand which part of TTN together with ACTN2 forms a Z-Body.

      The TTN antibody is an N-terminal antibody localizing to the I-Band region of sarcomeres. We agree with the reviewer that a more thorough study of titin will be of interest and we are currently undertaking such a study. However, this is a methods paper presenting a tool. While some of the data we present does point to mechanistic hypotheses, it is beyond the scope of this study to fully characterize titin during sarcomere assembly.

      11) TITIN doublet was used to indicate a sarcomere in Fig. 4C-D. Moreover, they also used another combination (myomesin and F-ACTIN) to label a sarcomere in Fig. 6D. Can they compare the difference between these two methods or by using these two methods (TITIN doublet) and (myomesin and F-ACTIN), how is the average length of sarcomere? Will the sarcomere length be the same?

      We noted in the manuscript that due to the organization of titin doublets (wrapping around the ends of Z-Lines) that the average titin doublet will be approximately 0.3 um longer than the ZLine. We did not expect to see a difference in lengths of myomesin M-Lines and mature actinin2 Z-Lines and indeed do not see major differences in the average lengths (between 2.0 and 2.5 um in 24 hour control cells)

      12) They used siRNA method to knockdown MYH6, MYH7 and MYOM and concluded that the knockdown of these genes did not affect the Z-line assembly. Even though they showed very nice knockdown efficiency of these proteins, they should (1) co-stain MYH6/TITIN/actinin2 and MYH6/ myomesin /actinin2 for Fig. 7C. (2) MYH7/TITIN/actinin2 and MYH7/ myomesin /actinin2 for Fig. 7I. (3) MYOM1/TITIN/actinin2 and MYOM2/TITIN/actinin2 for Fig. 8A. (4) MYH7/MYOM1 and MYH7/MYOM2 for Fig. 8H to make sure the cells they measured were truly knockdownpositive cells,

      The antibodies for alpha and beta myosin are not very efficient for immunofluorescence, and work best for western blots. We decided also to choose a random subset of the cells on the dish to be sure to eliminate any risk of cherry-picking. While imaging cells on the dish, we looked only at the DAPI nuclear channel and selected 50 cells minimum per dish with only this channel, then imaged the other channels.

      Minor comments:

      1) Well-organized sarcomere structure on DMSO treated cells in Fig.5A and Fig. 6A, but it was disarray in Fig. S3M. Why?

      Figure S3 shows hiCMs that have only been allowed to spread for 6 hours, which have not formed mature sarcomeres yet, hence the disarray.

      2) Fig 1A, Fig2B: please label the name of the antibody, not the actin filament

      We used phalloidin labelling here, which marks actin filaments. We have updated the figure legends to be more clear. Thank you!

      3) Fig. 7I: actinin2 instead of actinin

      Thank you for catching this! We have fixed it.

      Reviewer #3 (Recommendations For The Authors):

      Testing the app using images shot by other microscopy systems, magnifications, and cardiomyocytes from other species, as noted in the public review above, should make the app even more wildly useful.

      A more formal head-to-head comparison with other approaches will be more convincing in showing the new tool is superior

      I also think that a more detailed protocol for using the app will help other investigators.

      The app counts and measures many features, but it is not always clear how and using what algorithm these are measured. Including these details in a protocol or even as comments in the code will be very helpful for others.

      The protocol found on the public GitHub for the app will help other investigators to download, use, and understand the application. We have received contact from researchers who have been able to use the application without assistance from us, which is a good sign that the application is user-friendly and that the online protocol is sufficient.

    1. Author Response

      We thank the reviewers for their useful and constructive comments. In this provisional response, we will address a few of the major issues and plan to submit a detailed, point-by-point response along with the revised manuscript.

      1. Robustness of activated combination of neurons (the ‘activated ensemble’).

      The reviewers have asked for additional analyses and visualization of the group of neurons activated and a classification analysis to illustrate the point that the activated set of neurons would allow discrimination between different concentrations even after the spiking activity reduced significantly in the later trials. We relied on visualization using PCA (Manuscript Fig. 4) and quantification using correlation analysis (Manuscript Fig. 5a and Manuscript Supplementary Figure 2). But this point can be easily amplified further to support our conclusions and address a major concern raised by the reviewers.

      Visualization of neural responses across trials and odorants: As recommended, we followed the procedures used in Stopfer et al., 2003 (Fig. 6c) and Miura et al., 2012(Fig. 3C) to image neural responses across recorded PNs as a matrix (Author response image 1).

      Author response image 1.

      Author response image 1: Spike counts averaged over the entire 4s odor presentation window across all recorded neurons are shown as a function of trial number (columns). The sorting is same across different panels. Note that there are 80 neurons whose response was monitored for hexanol and octanol responses (Dataset 1; first row of panels), and 81 neurons whose response was monitored for isoamyl acetate and benzaldehyde (Dataset 2; second row of panels). As can be noted, across the 25 trials the pattern of activation remains consistent. Also, the activated combination of neurons varied robustly with odor identity and intensity.

      Classification analysis: To illustrate that there is enough information to recognize an odorant and discriminate between different intensities, we performed a leave-one-trial-out classification analysis. The left-out trial was assigned the class label of its nearest neighbor (using correlation distance metric). The results from this classification analysis are shown below in Author response image 2. As a control, we shuffled the odor class labels and repeated the leave-one-trial-out classification analysis.

      Author response image 2.

      Author response image 2: Results from classification analysis are shown for the two datasets: hexanol–octanol at different concentrations (dataset 1; 80 PNs), and isoamyl acetate and benzaldehyde (dataset 2; 81 PNs). We did a leave-onetrial-out validation. The true odor label is shown along the x-axis and the predicted odor label is shown along the yaxis. As can be noted, the class labels for every single trial were correctly predicted in both datasets. The result after class labels were shuffled is also shown for comparison. These results strongly support our conclusion that odor intensity information is preserved and odor concentration can be recognized independent of adaptation.

      Correlation with the first trial:

      We had shown the correlation across odorants and concentrations as a function of the trial (manuscript Figure 5A). To complement these analyses, here we focus on the correlations with the response evoked in the first trial of each odorant at high and low concentrations and plot this information as a function of trial number (Author response image 3, 4). As can be noted, the correlation across different trials of a given odorant at specific concentrations remains much higher than all other conditions.

      Author response image 3.

      Author response image 3: (top-left) Correlation between 80-dimensional neural responses (averaged over the entire 4s odor presentation window) with the first trial of hexanol at high intensity (hex-H; 1% v/v) is plotted as a function of trial number. (top-right) similar plots but correlation computed with neural responses evoked during the first trial of octanol at high intensity (oct-H; 1% v/v). (bottom-left) similar plots but correlation computed with neural responses evoked in the first trial of hexanol at low intensity (hex-L; 1% v/v). (bottom-right) similar plots but correlation computed with neural responses evoked in the first trial of octanol at low intensity (oct-L; 1% v/v).

      Author response image 4.

      Author response image 4: (top-left) Correlation between 81-dimensional neural responses (averaged over the entire 4s odor presentation window) with the first trial of isoamyl acetate at high intensity (iaa-H; 1% v/v) is plotted as a function of trial number. (top-right) similar plots but correlation computed with neural responses evoked in the first trial of benzaldehyde at a high intensity (bza-H; 1% v/v). (bottom-left) similar plots but correlation computed with neural responses evoked in the first trial of isoamyl acetate at low intensity (iaa-L; 1% v/v). (bottom-right) similar plots but correlation computed with neural responses evoked in the first trial of benzaldehyde at low intensity (bza-L; 1% v/v).

      Behavioral significance and dynamics: The reviewers had wondered about the relevance of the behavior to the organism. The maxillary palps are sensory organs close to the mouth parts that are used to grab food and help with the feeding process. In our previous studies, we had shown that these palpopening responses are innately triggered by many ‘appetitive odorants.’ However, the probability of palp opening varied across different odorants (Chandak and Raman, 2023). Some odorants evoked higher palp-opening responses and others reduced the probability of palp-opening response (below the median value across odorants). Since all other parameters (such as the clicking sound of valves, and mechanical cues due to airflow during odor presentation), are the same across these different odorants, these observed differences in palp-opening response probability are attributed to the identity of the odorants presented.

      Author response image 5.

      Author response image 5: Preference indices were calculated for all odors tested and are shown as a bar plot (n = 26 locusts). Blue bars indicate odors classified as appetitive, gray bars indicate neutral odors and red bars indicate unappetitive odors. Locusts with a significant deviation from the median response (one-sided binomial test, P < 0.1, were classified as either being appetitive or unappetitive; P < 0.1, P < 0.05, **P < 0.01). Error bars indicate s.e.m. [Replotted Fig 1.c from Chandak and Raman, 2023].

      We had also shown that we could train locusts to have stereo-typed palp-opening responses using the classical conditioning approach (odor – odor-conditioned stimulus and food reward – unconditioned stimulus; Video: https://static- content.springer.com/esm/art%3A10.1038%2Fncomms7953/MediaObjects/41467_2015_BFncomms7953 _MOESM483_ESM.mov; Saha et al., 2015). The dynamics of those conditioned palp-opening responses have been well characterized.

      We will use similar tracking procedures to monitor and quantify the dynamics of innate palp-opening responses as well. We will add supplementary videos to fully capture this behavior.

      Early vs. late neural responses:

      Since behavioral responses are more likely to start as soon as the odorant is presented, the reviewers wondered whether there are differences in the observed findings if we focus only on the early neural activity (as it might be more important to triggering behavior). Note that the median response time for conditioned palp-opening responses is less than 750 ms (Saha et al., 2015, Chandak and Raman, 2023). Hence, we divided the neural dataset and analyzed the neural response patterns during these early (0-750 ms after onset) and late (2-4 s after odor onset) time windows. In both these epochs, we found that the total spike counts across neurons reduced as a function of trial number or repetition and the combination of neuron activated remained robust (Author response images 6-11). Hence, we conclude that while the neural responses in different time windows would be important for shaping other parameters of behavioral response dynamics, the overall behavioral response probability that we used in our analysis had a similar relationship with early, late, or total neural activity during the entire odor presentation (i.e. time-window of the neural response did not matter for the analyses presented in the manuscript).

      Author response image 6.

      Author response image 6: Total spike counts reduced as a function of trial number. This reduction was observed for the total spike counts during the entire odor presentation window and during both the early (0-750 ms) and late (2-4 s) response time windows. Dataset 1: 80 PNs, hexanol, and octanol odorants.

      Author response image 7.

      Author response image 7: Total spike counts reduced as a function of trial number. This reduction was observed for the total spike counts during the entire odor presentation window and during both the early (0-750 ms) and late (2-4 s) response time windows. Dataset 2: 81 PNs, isoamyl acetate, and benzaldehyde odorants.

      Author response image 8.

      Author response image 8: Similar plots as in Figures 3 and 4 but analyzing 80-dimensional spike count vectors calculated using only the first 750 ms of odor-evoked response. Note that the correlation with the odor evoked response in the first trial remains high across trials. But between different odorants or different intensities of the same odorant, the response correlation drops significantly. Dataset 1: 80 PNs, hexanol, and octanol odorants.

      Author response image 9.

      Author response image 9: Similar plots as in Figures 3 and 4 but analyzing 80-dimensional spike count vectors calculated using only the last 2 seconds of odor-evoked response. Note that the correlation with the odor evoked response in the first trial remains high across trials. But between different odorants or different intensities of the same odorant, the response correlation drops significantly. Dataset 1: 80 PNs, hexanol, and octanol odorants.

      Author response image 10.

      Author response image 10: Similar plots as in Figures 3 and 4 but analyzing 80-dimensional spike count vectors calculated using only the first 750 ms of odor-evoked response. Note that the correlation with the odor evoked response in the first trial remains high across trials. But between different odorants or different intensities of the same odorant, the response correlation drops significantly. Dataset 2: 81 PNs, isoamyl acetate, and benzaldehyde odorants.

      Author response image 11.

      Author response image 11: Similar plots as in Figures 3 and 4 but analyzing 80-dimensional spike count vectors calculated using only the last 2 seconds of odor-evoked response. Note that the correlation with the odor evoked response in the first trial remains high across trials. But between different odorants or different intensities of the same odorant, the response correlation drops significantly. Dataset 2: 81 PNs, isoamyl acetate, and benzaldehyde odorants.

      Other Statistical Tests:

      The reviewers felt that in many analyses, we did not include error bars to indicate the sample size, SEM, or SD. We will fix this by adding the sample size information to each panel and as appropriate. However, we would also like to point out that many of the analyses are done in a trial-by-trial fashion (e.g. Manuscript Figures 3 – 6). For these analyses, it would not be possible to add SEM or SD. One condition (hex -H or iaa-H) was repeated in each dataset, and we have added them in the results shown in this response letter to demonstrate repeatability. We will strive our best to add these statistics as would be appropriate, but this cannot be done for the trial-by-trial analyses.

      References:

      Stopfer M, Jayaraman V, Laurent G. Intensity versus identity coding in an olfactory system. Neuron. 2003 Sep 11;39(6):991-1004. doi: 10.1016/j.neuron.2003.08.011. PMID: 12971898.

      Miura K, Mainen ZF, Uchida N. Odor representations in olfactory cortex: distributed rate coding and decorrelated population activity. Neuron. 2012 Jun 21;74(6):1087-98. doi: 10.1016/j.neuron.2012.04.021. PMID: 22726838; PMCID: PMC3383608.

      Chandak, R., Raman, B. Neural manifolds for odor-driven innate and acquired appetitive preferences. Nat Commun 14, 4719 (2023). https://doi.org/10.1038/s41467-023-40443-2

      Saha, D., Li, C., Peterson, S. et al. Behavioural correlates of combinatorial versus temporal features of odour codes. Nat Commun 6, 6953 (2015). https://doi.org/10.1038/ncomms7953

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The cerebral cortex, or surface of the brain, is where humans do most of their conscious thinking. In humans, the grooves (sulci) and bumps (convolutions) have a particular pattern in a region of the frontal lobe called Broca's area, which is important for language. Specialists study features imprinted on the internal surfaces of braincases in early hominins by casting their interiors, which produces so-called endocasts. A major question about hominin brain evolution concerns when, where, and in which fossils a humanlike Broca's area first emerged, the answer to which may have implications for the emergence of language. The researchers used advanced imaging technology to study the endocast of a hominin (KNM-ER 3732) that lived about 1.9 million years ago (Ma) in Kenya to test a recently published hypothesis that Broca's remained primitive (apelike) prior to around 1.5 Ma. The results are consistent with the hypothesis and raise new questions about whether endocasts can be used to identify the genus and/or species of fossils.

      We would like to thank Rev. 1 for their comments on our paper.

      Reviewer #2 (Public Review):

      The authors tried to support the hypothesis that early Homo still had a primitive condition of Broca's cap (the region in fossil endocasts corresponding to Broca's area in the brain), being more similar to the condition in chimpanzees than in humans. The evidence from the described individual points to this direction but there are some flaws in the argumentation.

      We are grateful to Rev. 2 for their comments, although we partially agree with some of them.

      First, we would like to rectify the statement of Rev. 2 that we “tried to support the hypothesis that early Homo still had a primitive condition of Broca's cap”, indeed, our aim was to test this hypothesis and not to try to validate it.

      First, only one human and one chimpanzee were used for comparison, although we know that patterns of brain convolutions (and in addition how they leave imprints in the endocranial bones) are very variable.

      We understand the point raised by Rev. 2 about the variation of brain convolutions in humans and chimpanzees. We used atlases published by Connolly (1950), Falk et al. (2018) and de Jager et al. (2019, 2022) to analyse the endocast of KNM-ER 3732 and compare it to the extant human and chimpanzee cerebral conditions. However, in Figure 2, for the sake of clarity only two Homo and Pan specimens were used to illustrate the comparison (as it has been done in other published papers, e.g., Carlson et al., 2011; Science, Gunz et al., 2020 Sci Adv). In the revised version, we modified the manuscript to explain further our approach (line 156) “We used brain and endocast atlases published in Connolly (1950), Falk et al. (2018) and de Jager et al. (2019, 2022; see also www.endomap.org) for comparing the pattern identified in KNM-ER 3732 to those described in extant humans and chimpanzees. To the best of our knowledge, these atlases are the most extensive atlases of extant human and chimpanzee brains/endocasts available to date and are widely used in the literature to explore variability in sulcal patterns. In Figure 2, the extant human and chimpanzee conditions are illustrated by one extant human (adult female) and one extant chimpanzee (adult female) specimens from the Pretoria Bone Collection at the University of Pretoria (South Africa) and in the Royal Museum for Central Africa in Tervuren (Belgium), respectively (Beaudet et al., 2018).”.

      Second, the evidence from this fossil specimen adds to the evidence of previously describe individuals but still not yet fully prove the hypothesis.

      We tempered our discussion by concluding that (line 116) “Overall, the present study not only demonstrates that Ponce de León et al.’s (2021) hypothesis of a primitive brain of early Homo cannot be rejected, but also adds information […]”.

      Third, there is a vicious circle in using primitive and derived features to define a fossil species and then using (the same or different) features to argue that one feature is primitive or derived in a given species. In this case, we expect members of early Homo to be derived compared to their predecessors of the genus Australopithecus and that's why it seems intriguing and/or surprising to argue that early Homo has primitive features. However, we should expect that there is some kind of continuum or mosaic in a time in which a genus "evolves into" another genus. This discussion requires far more discussions about the concepts we use, maybe less discussion about what is different between the two groups but more discussion about the evolutionary processes behind them.

      We fully agree with Rev. 2 on this aspect. We believe that identifying these differences/similarities between fossil and extant hominids constitute the first step of a better understanding of the evolutionary mechanisms. Our work suggests indeed a certain continuity between genera and raises questions on the genus concept and how to interpret the specimens currently attributed to early Homo. In the revised version of the manuscript we included a reference to this possible scenario (line 134): “[…] or to the absence of a definite threshold between the two genera based on the morphoarchitecture of their endocasts (Wood and Collard, 1999).”.

      Fourth, the data of convolutional imprints presented are rather subjective when identifying which impressions represent which brain convolutions. Not seeing an impression does not necessarily mean that the corresponding brain feature did not exist. Interestingly, the manuscript does not mention and discuss at all the frontoorbital sulcus. This is a sulcus that usually runs from the orbital surface of the frontal lobe up to divide the inferior frontal gyrus in chimpanzees, a condition totally different than in humans who do not have a frontoorbital sulcus. Could such a sulcus be identified, this would provide a far more convincing argument for a primitive condition in this specimen. In Australopithecus sediba, e.g., the condition in this region seems to be a mosaic in which some aspects of the morphology seem to be more modern while one of the sulcual impressions can well be interpreted as a short frontoorbital sulcus. For this specimen, by the way, I would come back to my third point above: some experts in the field might argue that this specimen could belong to Homo rather than Australopithecus...

      We agree that the presence of a fronto-orbital sulcus would be more conclusive. However, this sulcus has not been identified in KNM-ER3732 and the region in which we would expect to find it is not preserved. As demonstrated by Ponce de León et al. (2021), because of the topographic relationships between sulci (and cranial structures), it is possible to interpret imprints on endocasts and the evolutionary polarity of some traits even in the absence of landmarks such as the fronto-orbital sulcus. In Australopithecus sediba the main derived feature of the endocast corresponds to the ventrolateral bulge in the left inferior frontal gyrus, and not to the sulcal pattern itself (Carlson et al., 2011 Science). However, the discussion around the taxonomic status of this taxon confirms the urgent need for reconsidering specimens from that time period and clarifying the mosaic-like or concerted evolution of the derived Homo-like traits within our lineage. Regarding the subjective nature of this approach, we invite readers to examine the specimen on MorphoSource (https://www.morphosource.org/concern/media/000497752?locale=en) and to request access to the National Museums of Kenya to the physical or virtual specimen to falsify our hypothesis.

      According to my arguments above, I think that this manuscript might revive interesting discussions about this topic but it is not likely to settle them because the data presented are not strong enough to fully support the hypothesis.

      We would be more than happy to consider new/other specimens with similar chronological and geographical contexts and investigate further this hypothesis in the future.

      Reviewer #3 (Public Review):

      The authors provide a detailed analysis of the sulcal and sutural imprints preserved on the natural endocast and associated cranial vault fragments of the KNM-ER3732 early Homo specimen. The analyses indicate a primitive ape-like organization of this specimen's frontal cortex. Given the geological age of around 1.9 million years, this is the earliest well-documented evidence of a primitive brain organization in African Homo.

      In the discussion, the authors re-assess one of the central questions regarding the evolution of early Homo: was there species diversity, and if yes, how can we ascertain it? The specimen KNM-ER1470 has assumed a central role in this debate because it purportedly shows a more advanced organization of the frontal cortex compared to other largely coeval specimens (Falk, 1983). However, as outlined in Ponce de León et al. 2021 (Supplementary Materials), the imprints on the ER1470 endocranium are unlikely to represent sulcal structures and are more likely to reflect taphonomic fracturing and distortion. Dean Falk, the author of the 1983 study, basically shares this view (personal communication). Overall, I agree with the authors that the hypothesis to be tested is the following: did early Homo populations with primitive versus derived frontal lobe organizations coexist in Africa, and did they represent distinct species?

      I greatly appreciate that the authors make available the 3D surface data of this interesting endocast.

      We are grateful to Rev. 3 for their comments and for contextualizing our finding. We would also like to point out that, although the 3D surface can be viewed on MorphoSource, permission from the National Museums of Kenya has to be requested for studying the specimen and getting access to the physical specimen and/or the 3D model.

      Reviewer #1 (Recommendations For The Authors):

      Holloway, Broadfield & Yuan (2004) estimate ER 3732 as having a cranial capacity of 750 cc, which is larger than chimps and australopiths and similar to ER 1470 (752 cc, same reference). (That for Dmanisi 2282 is somewhat smaller at around 650 cc.) Cranial capacities should be mentioned along with added discussion about possible allometric scaling of (increased) numbers of sulci with increasing brain size as well as possible shifts in locations of sulci relative to cranial sutures in larger-brained (including due to ontogenetic maturation) in individuals/species. Could these variables (especially brain size) be relevant for your discussion/conclusions?

      We thank Rev. 1 for their suggestion. We included the estimate by Holloway et al. (2004) (line 95): “Holloway et al. (2004) estimated the endocranial volume as about 750-800 cc but insisted on the low reliability of their estimate.”. Additionally, we raised the possibility of potential allometric effect (line 149): “In parallel, the possibility of allometric scaling and influence of brain size on sulcal patterns in early Homo has to be further explored.” for future discussion.

      From the two figures, it appears that the authors produced a virtual endocast from the cranial remains of ER 3732 and compared its features with those seen on a virtual reproduction of the corresponding natural endocast. If so, this needs to be clarified in the text, not just the figures.

      We thank Rev. 1 for their suggestions that were integrated.

      Reviewer #3 (Recommendations For The Authors):

      While the sulcal imprints on the left hemisphere can be interpreted unambiguously, the anatomical assignment of those on the right side may need to be reconsidered, as they are more ambiguous. For example, the postcentral sulcus (pt) almost touches the middle frontal sulcus, which is an unlikely natural configuration.

      We agree that the configuration on the right hemisphere is intriguing, especially when compared to the extant human and chimpanzee atlases. As such, we decided to change the label for what we think could be the inferior frontal sulcus and leave a question mark instead.

      I encourage the authors to include:

      • a posterior view in Figure 1, and mark the lambdoid suture, parts of which seem to be preserved especially on the left side. This will help the readership to better understand which parts of the endocranial morphology are preserved.

      • a scale bar would be of great utility to appreciate the small size of this specimen. The distance from bregma to the Broca cap seems to be short, indicating an endocranial volume much smaller than the published estimate of 750 ccm. Perhaps the authors can provide a new estimate, which would provide further support for the arguments proposed in the discussion section, especially the question of any presence of Australopithecus at Koobi Fora.

      We included a posterior view of the specimen in Figure 1 and scale bar and modified the legend accordingly. Unfortunately, we were not able to identify with certainty the feature that could correspond to the lambdoid suture. We might see the impression where the parietal bone meets the occipital bone, but there is a risk of misidentification (which is an issue frequently raised in the literature, see for example Gunz et al. 2020 Sci Adv). Concerning the endocranial volume, in the revised version of the manuscript we included the estimate by Holloway et al. (2004). Because the specimen only preserves the superior part, we are reluctant in providing an estimate of the total volume. However, we agree that this would be an interesting feature to integrate in the interpretation of this specimen.

      Minor points

      • This sentence needs to be clarified: «The superior temporal sulcus nearly intersects the lateral fissure on the right hemisphere».

      • The terms «Broca's region» and «orbital cap» need some more context. Do the authors mean «Broca's cap» in either instance?

      We clarified/modified when needed, thank you very much.

      We included minor corrections in addition to those recommended by the reviewers:

      -Lines 50, 74, 142, 149: “Broca’s area” instead of “Broca’s cap”

      -Line 73: “in the pre-1.5 Ma Homo specimen” instead of “in pre-1.5 Ma Homo specimen”

      -Line 100: we specified “in human brains and endocasts”

      -Line 120: “sulcal pattern” instead of “sulcal patterns”

      -Line 144: “behaviors” (plural)

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Strengths:

      The manuscript is well written and the experimental work well executed. It shows that major features of the classical two-component HipAB TA system have somehow been rerouted in the case of the tripartite HipBST. This includes the N-terminal domain of the HipA toxin, which now functions as bona fide antitoxin, and the partly relegated HipB antitoxin, which could only function as a transcription regulator. In addition, this work shows a new mode of inhibition of a kinase toxin and highlights the impact of the phosphorylation state of key toxin residues in controlling the activity of the antitoxin.

      Weaknesses:

      A major weakness of this work is the lack of data concerning the role of HipB, which likely does not act as an antitoxin. Does it act as a transcriptional regulator of the hipBST operon and to what extent both HipS and HipT contribute to such regulation? These are still open questions.

      We thank the reviewer for their feedback and will include a supplementary figure (Figure 1 supplement 2) and accompanying text that shows the transcriptional role of HipB, and how HipS and HipT influence this regulatory effect.

      In addition, there is no in-depth structural comparison between the structure of the HipBST solved in the work and the two recent structures of HipBST from Legionella. This is also a major weakness of this work.

      A structural comparison to the recent structures from Legionella will be included in the discussion, including Figure 6 supplement 1.

    1. Author Response:

      This work presents valuable information about the specificity and promiscuity of toxic effector and immunity protein pairs. The evidence supporting the claims of the authors is currently incomplete, as there is concern about the methodology used to analyze protein interactions, which did not take potential differences in expression levels, protein folding, and/or transient interaction into account. Other methods to measure the strength of interactions and structural predictions would improve the study. The work will be of interest to microbiologists and biochemists working with toxin-antitoxin and effector-immunity proteins.

      We thank the reviewers for considering this manuscript. We agree that this manuscript provides a valuable and cross-discipline introduction to new EI pair protein families where we focus on the EI pair’s flexibility and impacts on community structure. As such, we believe we have provided a solid foundation for future studies to examine non-cognate interactions and their possible effects on microbial communities. This, by definition, leaves some areas “incomplete” and, therefore, open for further investigations. While the methods we show do take into account potential differences in binding assays, we will more explicitly address how “expression, protein folding, and/or transient binding” may play into this expanded EI pair model upon revision and temper the discussion of the proposed model. We have responded to the reviewers’ public comments (italicized below).

      Public Reviews:

      Note: Reviewer 1, who appeared to focus on a subset of the manuscript rather than the whole, based their comments on several inaccuracies, which we discuss below. We found the tone in this reviewer's comments to be, at times, inappropriate, e.g., using "harsh" and "simply too drastic" to imply that common structure-function analyses were outside of the field-standard methods. We also note that the reviewer took a somewhat atypical step in reviewing this manuscript by running and analyzing the potential protein-complex data in AlphaFold2 but did not discuss areas of low confidence within that model that may contradict their conclusions. We are concerned their approach muddled valid scientific criticisms with problematic conclusions.

      Reviewer #1 (Public Review):

      In this manuscript, Knecht, Sirias et al describe toxin-immunity pair from Proteus mirabilis. Their observations suggest that the immunity protein could protect against non-cognate effectors from the same family. They analyze these proteins by dissecting them into domains and constructing chimeras which leads them to the conclusion that the immunity can be promiscuous and that the binding of immunity is insufficient for protective activity.

      Strengths:

      The manuscript is well written and the data are very well presented and could be potentially interesting. The phylogenetic analysis is well done, and provides some general insights.

      Weaknesses:

      1) Conclusions are mostly supported by harsh deletions and double hybrid assays. The later assays might show binding, but this method is not resolutive enough to report the binding strength. Proteins could still bind, but the binding might be weaker, transient, and out-competed by the target binding.

      The phrasing of structure-function analyses as “harsh” is a bit unusual, as other research groups regularly use deletions and hybrid studies. Given the known caveats to deletion and domain substitutions, we included point-mutation analyses for both the effector and immunity proteins, as found on lines 105 - 113 and 255 - 261 in the current manuscript. These caveats are also why we coupled the in vitro binding analyses with in vivo protection experiments in two distinct experimental systems (E. coli and P. mirabilis). Based on this manuscript’s introductory analysis (where we define and characterize the genes, proteins, interactions, phylogenetics, and incidences in human microbiomes), the next apparent questions are beyond the scope of this study. Future approaches would include analyzing purified proteins from these effector (E) and immunity (I) protein families using biochemical assays, such as X-ray crystallography, circular dichroism spectroscopy, among others.

      (Interestingly, most papers in the EI field do not measure EI protein affinity (Jana et al., 2019, Yadav et al., 2021). Notable exceptions are earlier colicin research (Wallis et al., 1995) and a new T6SS EI paper (Bosch et al., 2023) published as we submitted this manuscript.)

      2) While the authors have modeled the structure of toxin and immunity, the toxin-immunity complex model is missing. Such a model allows alternative, more realistic interpretation of the presented data. Firstly, the immunity protein is predicted to bind contributing to the surface all over the sequence, except the last two alpha helices (very high confidence model, iPTM>0.8). The N terminus described by the authors contributes one of the toxin-binding surfaces, but this is not the sole binding site. Most importantly, other parts of the immunity protein are predicted to interact closer to the active site (D-E-K residues). Thus, based on the AlphaFold model, the predicted mechanism of immunization remains physically blocking the active site. However, removing the N terminal part, which contributes large interaction surface will directly impact the binding strength. Hence, the toxin-immunity co-folding model suggests that proper binding of immunity, contributed by different parts of the protein, is required to stabilize the toxin-immunity complex and to achieve complete neutralization. Alternative mechanisms of neutralization might not be necessary in this case and are difficult to imagine for a DNAse.

      In response to the reviewer’s comment, we again reviewed the RdnE-RdnI AlphaFold2 complex predictions with the most updated version of ColabFold (1.5.2-patch with PDB100 and MMseq2) and have included them at the end of the responses [1].

      However, the literature reports that computational predictions of E-I complexes often do not match experimental structural results (Hespanhol et al., 2022, Bosch et al., 2023). As such, we chose not to include the predicted cognate and non-cognate RdnE-I complexes from ColabFold (which uses AlphaFold2) and will not include this data in revised manuscripts. (It is notable that reviewer 1 found the proposed expanded model and research so interesting as to directly input and examine the AI-predicted RdnE-RdnI protein interactions in AlphaFold2.)

      Discussion of the prevailing toxin-immunity complex model is in the introduction (lines 45-48) and Figure 5E. Further, there are various known mechanisms for neutralizing nucleases and other T6SS effectors, which we briefly state in the discussion (lines 359 - 361). More in-depth, these molecular mechanisms include active-site blocking (Benz et al., 2012), allosteric-site binding (Kleanthous et al., 1999 and Lu et al., 2014), enzymatic neutralization of the target (Ting et al., 2021), and structural disruption of both the active and binding sites (Bosch et al., 2023). Given this diversity of mechanisms, we did not presume to speculate on the as-of-yet unknown mechanism of RdnI protection.

      3) Dissection of a toxin into two domains is also not justified from a structural point of view, it is probably based on initial sequence analyses. The N terminus (actually previously reported as Pone domain in ref 21) is actually not a separate domain, but an integral part of the protein that is encased from both sides by the C terminal part. These parts might indeed evolve faster since they are located further from the active site and the central core of the protein. I am happy to see that the chimeric toxins are active, but regarding the conservation and neutralization, I am not surprised, that the central core of the protein fold is highly conserved. However, "deletion 2" is quite irrelevant - it deletes the central core of the protein, which is simply too drastic to draw any conclusions from such a construct - it will not fold into anything similar to an original protein, if it will fold properly at all.

      The reviewer’s comment highlights why we turned to the chimera proteins to dissect the regions of RdnE (formerly IdrD-CT), as the deletions could result in misfolded proteins. (We initially examined RdnE in the years before the launch of AlphaFold2.) However, the reviewer is incorrect regarding the N-terminus of RdnE. The PoNe domain, while also a subfamily of the PD-(D/E)XK superfamily, forms a distinct clade of effectors from the PD-(D/E)XK domain in RdnE (formally IdrD-CT) as seen in Hespanhol et al., 2022; this is true for other DNAse effectors as well. Many studies analyzing effectors within the PD-(D/E)XK superfamily only focus on the PD-(D/E)XK domain, removing just this domain from the context of the whole protein (Hespanhol et al., 2022; Jana et al., 2019). Of note, in RdnE, this region alone (containing the DNA-binding domain) is insufficient for DNAse activity (unlike in PoNe).

      4) Regarding the "promiscuity" there is always a limit to how similar proteins are, hence when cross-neutralization is claimed authors should always provide sequence similarities. This similarity could also be further compared in terms of the predicted interaction surface between toxin and immunity.

      Reviewer 1 points out a fundamental property of protein-protein interactions that has been isolated away from the impacts of such interactions on bacterial community structure. We have provided the whole protein alignments in supplemental figure 3, the summary images in Figure 3D, and the protein phylogenetic trees in Figure 3C. We encourage others to consider the protein alignments as percent amino acid sequence similarity is not necessarily a good gauge for protein function and interactions. RuBisCo is one example of how protein sequence similarity can be small while functions remain highly conserved. These data are publicly available on the OSF website associated with this manuscript https://osf.io/scb7z/, and we hope the community explores the data there.

      In consideration of the enthusiasm to deeply dive into the primary research data, we have included the pairwise sequence identities across the entire proteins here: Proteus RdnI vs. Rothia RdnI: 23.6%; Proteus RdnI vs. Prevotella RdnI: 16.3%, Proteus RdnI vs. Pseudomonas RdnI: 14.6%; Rothia RdnI vs. Prevotella RdnI: 22.4%, Rothia RdnI vs. Pseudomonas RdnI: 17.6%; Prevotella RdnI vs. Pseudomonas RdnI: 19.5%. (As stated in response to reviewer 1 comment 2, we do not find it appropriate to make inferences based on AlphaFold2-predicted protein complexes.)

      Overall, it looks more like a regular toxin-immunity couple, where some cross-reactions with homologues are possible, depending on how far the sequences have deviated. Nevertheless, taking all of the above into account, these results do not challenge toxin-immunity specificity dogma.

      In this manuscript, we did not intend to dismiss the E-I specificity model but rather point out its limitations and propose an important expansion of that model that accounts for cross-protection and survival against attacks from other genera. We agree that it is commonly considered that deviations in amino acid sequence over time could result in cross-binding and protection (see lines 364-368). However, the impacts of such cross-binding on community structure, bacterial survival, and strain evolution have rarely been considered or addressed in prior literature, with exceptions such as in Zhang et al., 2013 and Bosch et al., 2023. One key insight we propose and show in this manuscript is that cross-binding can be a fitness benefit in mixed communities; therefore, it could be selected for evolutionarily (lines 378-380), even potentially in host microbiomes.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Knecht et al entitled "Non-cognate immunity proteins provide broader defenses against interbacterial effectors in microbial communities" aims at characterizing a new type VI secretion system (T6SS) effector immunity pair using genetic and biochemical studies primarily focused on Proteus mirabilis and metagenomic analysis of human-derived data focused on Rothia and Prevotella sequences. The authors provide evidence that RdnE and RdnI of Proteus constitute an E-I pair and that the effector likely degrades nucleic acids. Further, they provide evidence that expression of non-cognate immunity derived from diverse species can provide protection against RdnE intoxication. Overall, this general line of investigation is underdeveloped in the T6SS field and conceptually appropriate for a broad audience journal. The paper is well-written and, aside from a few cases, well-cited. As detailed below however, there are several aspects of this paper where the evidence provided is somewhat insufficient to support the claims. Further, there are now at least two examples in the literature of non-cognate immunity providing protection against intoxication, one of which is not cited here (Bosch et al PMID 37345922 - the other being Ting et al 2018). In general therefore I think that the motivating concept here in this paper of overturning the predominant model of interbacterial effector-immunity cognate interactions is oversold and should be dialed back.

      We agree that analyses focusing on flexible non-cognate interactions and protection are underdeveloped within the T6SS field and are not fully explored within a community structure. These ideas are rapidly growing in the field, as evidenced by the references provided by the reviewer. As stated earlier, we did not intend to overturn the prevailing model but rather propose an expanded model that accounts for protection against attacks from foreign genera.

      Strengths:

      One of the major strengths of this paper is the combination of diverse techniques including competition assays, biochemistry, and metagenomics surveys. The metagenomic analysis in particular has great potential for understanding T6SS biology in natural communities. Finally, it is clear that much new biology remains to be discovered in the realm of T6SS effectors and immunity.

      Weaknesses:

      The authors have not formally shown that RdnE is delivered by the T6SS. Is it the case that there are not available genetics tools for gene deletion for the BB2000 strain? If there are genetic tools available, standard assays to demonstrate T6SS-dependency would be to interrogate function via inactivation of the T6SS (e.g. by deleting tssC).

      Our research group showed that the T6SS secretes RdnE (previously IdrD) in Wenren et al., 2013 (cited in lines 71-73). We later confirmed T6SS-dependent secretion by LC-MS/MS (Saak et al., 2017).

      For swarm cross-phyla competition assays (Figure 4), at what level compared to cognate immunity are the non-cognate immunity proteins being expressed? This is unclear from the methods and Figure 4 legend and should be elaborated upon. Presumably these non-cognate immunity proteins are being overexpressed. Expression level and effector-to-immunity protein stoichiometry likely matters for interpretation of function, both in vitro as well as in relevant settings in nature. It is important to assess if native expression levels of non-cognate cross-phyla immunity (e.g. Rothia and Prevotella) protect similarly as the endogenously produced cognate immunity. This experiment could be performed in several ways, for example by deleting the RdnE-I pair and complementing back the Rothia or Prevotella RdnI at the same chromosomal locus, then performing the swarm assay. Alternatively, if there are inducible expression systems available for Proteus, examination of protection under varying levels of immunity induction could be an alternate way to address this question. Western blot analysis comparing cognate to non-cognate immunity protein levels expressed in Proteus could also be important. If the authors were interested in deriving physical binding constants between E and various cognate and non-cognate I (e.g. through isothermal titration calorimetry) that would be a strong set of data to support the claims made. The co-IP data presented in supplemental Figure 6 are nice but are from E. coli cells overexpressing each protein and do not fully address the question of in vivo (in Proteus) native expression.

      P. mirabilis strain ATCC29906 does not encode the rdnE and rdnI genes on the chromosome (NCBI BioSample: SAMN00001486) (line 151). Production of the RdnI proteins, including the cognate Proteus RdnI, comes from equivalent transgenic expression vectors. Specifically, the rdnI genes were expressed under the flaA promoter in P. mirabilis strain ATCC29906 (Table 1) for the swarm competition assays found in Figure 2C and Figure 4. This promoter results in constitutive expression in swarming cells (Belas et al., 1991; Jansen et al., 2003).

      Lines 321-324, the authors infer differences between E and I in terms of read recruitment (greater abundance of I) to indicate the presence of orphan immunity genes in metagenomic samples (Figure 5A-D). It seems equally or perhaps more likely that there is substantial sequence divergence in E compared to the reference sequence. In fact, metagenomes analyzed were required only to have "half of the bases on reference E-I sequence receiving coverage". Variation in coverage again could reflect divergent sequence dipping below 90% identity cutoff. I recommend performing metagenomic assemblies on these samples to assess and curate the E-I sequences present in each sample and then recalculating coverage based on the exact inferred sequences from each sample.

      This comment raises the challenges with metagenomic analyses. It was difficult to balance specificity to a particular species’ DNA sequence with the prevalence of any homologous sequence in the sample. Given the distinction in binding interactions among the examined four species, we opted to prioritize specificity, accepting that we were losing access to some rdnE and rdnI sequences in that decision. We chose a 90% identity cutoff, which, through several in silica controls, ensured that each sequence we identified was the rdnE or rdnI gene from that specific species. For the Version of Record, we will revisit this decision and consider trying to account for sequence divergence by lowering the identity cutoffs as suggested.

      A description of gene-level read recruitment in the methods section relating to metagenomic analysis is lacking and should be provided.

      Noted. We will also include the raw code and sequences on the OSF website associated with this manuscript https://osf.io/scb7z/.

      Reviewer #3 (Public Review):

      [...] Strengths:

      The authors presented a strong rationale in the manuscript and characterized the molecular mechanism of the RdnE effector both in vitro and in the heterologous expression model. The utilization of the bacterial two-hybrid system, along with the competition assays, to study the protective action of RdnI immunity is informative. Furthermore, the authors conducted bioinformatic analyses throughout the manuscript, examining the primary sequence, predicted structural, and metagenomic levels, which significantly underscore the significance and importance of the EI pair.

      Weaknesses:

      1. The interaction between RdnI and RdnE appears to be complex and requires further investigation. The manuscript's data does not conclusively explain how RdnI provides a "promiscuous" immunity function, particularly concerning the RdnI mutant/chimera derivatives. The lack of protection observed in these cases might be attributed to other factors, such as a decrease in protein expression levels or misfolding of the proteins. Additionally, the transient nature of the binding interaction could be insufficient to offer effective defenses.

      Yes, we agree with the reviewer and hope that grant reviewers’ share this colleague’s enthusiasm for understanding the detailed molecular mechanisms of RdnE-RdnI binding across genera. We will continue to emphasize such caveats as the next frontier is clearly understanding the molecular mechanisms for RdnI cognate or non-cognate protection. We address the concerns regarding expression levels in the response to reviewer 2, comment 2.

      1. The results from the mixed population competition lack quantitative analysis. The swarm competition assays only yield binary outcomes (Yes or No), limiting the ability to obtain more detailed insights from the data.

      The mixed swam assay is needed when studying T6SS effectors that are primarily secreted during Proteus’ swarming activity (Saak et al. 2017, Zepeda-Rivera et al. 2018). This limitation is one reason we utilize in vitro, in vivo, and bioinformatic analyses. Though the swarm competition assay yields a binary outcome, we are confident that the observed RdnI protection is due to interaction with a trans-cell RdnE via an active T6SS. By contrast, many manuscripts report co-expression of the EI pair (Yadev et al., 2021, Hespanhol et al., 2022) rather than secreted effectors, as we have achieved in this manuscript.

      1. The discovery of cross-species protection is solely evident in the heterologous expression-competition model. It remains uncertain whether this is an isolated occurrence or a common characteristic of RdnI immunity proteins across various scenarios. Further investigations are necessary to determine the generality of this behavior.

      We agree, which is why we submitted this paper as a launching point for further investigations into the generality of non-cognate interactions and their potential impact on community structure.

      Comments from Reviewing Editor:

      • In addition to the references provided by reviewer#2, the first manuscript to show non-cognate binding of immunity proteins was Russell et al 2012 (PMID: 22607806).
      • IdrD was shown to form a subfamily of effectors in this manuscript by Hespanhol et al 2022 PMID: 36226828 that analyzed several T6SS effectors belonging to PDDExK, and it should be cited.

      We appreciate that the reviewer and eLife staff pointed out missed citations. A revised manuscript will incorporate those studies and cite them appropriately.

      [1] The Proteus RdnE in complex with either the Prevotella or Pseudomonas RdnI showed low confidence at the interface (pIDDT ~50-70%); this AI-predicted complex might support the lack of binding seen in the bacterial two-hybrid assay. On the other hand, the Proteus and Rothia RdnI N-terminal regions show higher confidence at the interface with RdnE. Despite this, the C-terminus of the Proteus RdnI shows especially low confidence (pIDDT ~50%) where it might interact near RdnE’s active site (as suggested by reviewer 1). Given this low confidence and the already stated inaccuracies of AI-generated complexes, we would rather wait for crystallization data to inform potential protection mechanisms of RdnI.

      Author response image 1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for the constrictive and detailed feedback provided. We have adopted the proposed changes to improve the manuscript clarity and accessibility. The following revisions are included in the revised manuscript:

      Reviewer #1 (Public Review):

      The analytical framework is not sufficiently explained in the main text.

      We think the reviewer is referring to the conceptual framework mentioned in introduction. In the previously submitted manuscript, we did not provide details because the framework is published elsewhere. However, we agree with the reviewer that a short explanation may be helpful, which we have included in the resubmitted manuscript.

      The significance of findings in relation to functional changes is not clear. What are the consequences of enrichment of RNA transport or ribosome biogenesis pathways between pesticides and recovery stages, for example?

      We thank the reviewer for this suggestion. In the previously submitted manuscript, we included an explanation of the central functions these pathways can alter (e.g. metabolism and infection response). These functions are self-explanatory. However, we have elaborated on the consequence that the disruption of these pathways can cause in the resubmitted manuscript.

      The impact of individual biocides and climate variables, and their additive effects, are assessed but there is no information offered on non-additive interactions (e.g., synergistic, antagonistic).

      This was a misunderstanding based on our use of the term synergistic in this context. The approach by which we define a synergistic or joint effect of two environmental variables on a taxonomic group is explained in the methods section. This analysis is based on climate variables and biocide types contributing the largest covariances in the correlation analysis explained in Supplementary Fig. 5; Step 4. The combined effect of two environmental variables on a taxon was considered to be significant if the biocide type and the climate variable were each significantly correlated with the taxon over the same time window, and their average Pearson correlation was > 0.5 with padj < 0.05 (SWC analysis with 10,000 permutations). The biocide type and the climate variable were interpreted to have a joint effect on a given taxon if the linear combination of the biocide type and the climate variable had a larger Pearson correlation coefficient than each of the correlations between the family and the biocide type and the family and the climate variable individually, in the same time interval with padj < 0.05 (with 10,000 permutations in the SWC analysis). We realise that the use of synergistic or additive was not correct in this context and have replaced the term synergistic with joint effect throughout the manuscript.

      The level of confidence associated with results is not made explicit. The reader is given no information on the amount of variability involved in the observations, or the level of uncertainty associated with model estimates.

      As we didn’t use traditional statistical approaches, confidence level estimation in the traditional sense is not possible. Instead, we used permutation tests and adjusted P-values to identify significant correlations in the data. These approaches are more robust than traditional statistics for integrating and discovering complex, group-wise patterns among high-dimensional datasets. While most forms of machine learning require large sample sizes, sCCA uses fewer observations to identify the most correlated components among data matrices and captures the multivariate variability of the most important features.

      The major implications of the findings for regulatory ecological assessment are missed. Regulators may not be primarily interested in identifying past "ecosystem shifts". What they need are approaches which give greater confidence in monitoring outcomes by better reflecting the ecological impact of contemporary environmental change and ecosystem management. The real value of the work in this regard is that: (1) it shows that current approaches are inappropriate due to the relatively stable nature of the indicators used by regulators, despite large changes in pollutant inputs; (2) it presents some better alternatives, including both taxonomic and functional indicators; and (3) it provides a new reference (or baseline) for regulators by characterizing "semi-pristine" conditions.

      We thank the reviewer for this suggestion, which we have included in the main text (L451461)

      Reviewer #2 (Public Review):

      Results - They are brief and should expand some more. Particularly, there are no results regarding metabarcoding data (number of reads, filtering etc.). These details are important to know the quality of the data which represents the bulk of the analyses. Even the supplementary material gives little information on the metabarcoding results (e.g. number of ASVs - whether every ASV of each family were pooled etc.).

      We thank the reviewer for this suggestion. We have included a paragraph in results reporting read numbers and other statistics. The filtering criteria and handling of samples can be found in methods (L658-661; L670-675). As explained in methods the taxonomy was assigned using qiime feature-classifier classify-sklearn and used at family level where possible. When classification was not possible at family level because of incomplete/missing information in the online database or a poor match to reference database, the lowest classification possible was used.

      The drivers of biodiversity change section could be restructured and include main text tables showing the families positively or negatively correlated with the different variables (akin to table S2 but simplified).

      As there are over 180 unique families/taxonomic units correlated with at least one biocide or environmental variable, a simplified version of this table would be too large to include in the main text. Therefore, we prefer to keep this information in supplementary table 2 complete with correlation statistics.

      We thank the reviewers for providing detailed feedback on the manuscript and respond to their suggestions as follows:

      Reviewer #1 (Recommendations For The Authors):

      Thank you for the opportunity to review your manuscript, which I found interesting and enjoyable to read. Here are some suggestions for improving it.

      Remove spaces before citations in text.

      Lines 51-53: "Community-level biodiversity reliably explained freshwater ecosystem shifts whereas traditional quality indices (e.g. Trophic Diatom Index) and physicochemical parameters proved to be poor metrics for these shifts." Seems to be the wrong way around / not clear???

      Rephrased to clarify.

      Line 54: Should be "...advocates the use of..." or "...demonstrates the advantages of..."

      Done, thanks for the suggestion.

      Line 62: Spell out numbers <10, i.e. "sixth mass extinction"

      Done, thank you.

      Lines 66-72: These sentences lack clarity. It's not clear that "experimental manipulation of biodiversity" hasn't involved investigation of "multi-trophic changes". By the third of these four sentences it is not clear what "they" is referring to. And in the fourth sentence, "these holistic studies" are not defined. Perhaps it would suffice to say that experiments have so far focused primarily on a single trophic level and largely neglected freshwater systems.

      We have rephrased to improve clarity.

      Line 81: Delete unnecessary bracket

      Done, thank you.

      Line 82: "a minority of freshwater ecosystems" sounds as if you're saying that few freshwater ecosystems are represented in BioTIME, which seems obvious and would also apply to terrestrial and marine systems. Do you mean that freshwater ecosystems re not well represented in the data?

      We have clarified the sentence, thanks.

      Line 106: Resolve issue with citation in text at the end of the sentence (repeated at line 109 and possibly other lines).

      Done, thank you.

      Line 116: By ">1999s" do you mean 1990s?

      This was a typo. it was supposed to be >1999

      Line 120: The reader would benefit greatly from a brief explanation of explainable network models and multimodal learning in the introduction. Why are these the right tools to use? How do they work in this context? Figure 1 helps to some extent but needs more commentary in the text.

      We have included an explanation of the explainable network models and multimodal learning and how their use can be beneficial to the study of diverse data types.

      Line 144: Here and throughout the text the language could be much more efficient and readable. "Alpha diversity" does not require a definite article. Furthermore, when referring to significance it is convention to state the p-value, test statistic and test used.

      As there are different p-values for each barcode, we have included them in legend to Supplementary Fig. 1 to avoid crowding the main text. We prefer to leave the text unchanged for this reason.

      Line 155: "The primary producer's composition" is grammatically awkward and less suitable than "the composition of primary producers". This kind of awkwardness occurs again at line 285 ("diatom's") and possibly in other parts of the manuscript.

      Thanks, corrected.

      Line 169: The statement that this family was "relatively more abundant" needs a little more explanation. What is it relative to - other groups or to previous stages?

      More abundant than in the other phases – the sentence has been modified.

      Line 179: Nested brackets are unnecessary and affect readability. This could simply be a new sentence, i.e. "For example, Nitrospiraceae (nitrite oxidizers)..."

      Done, thanks.

      Line 215: "Functional biodiversity", which implies that some biodiversity is functional and some not, does not seem an appropriate term to describe the results you present in this section. Simply "functioning of the prokaryotic community" would suffice.

      Thanks, done.

      Line 214-233: This section may be inaccessible for many readers. For example, what are Kegg Orthologs and what role do they play in the functioning of a lake ecosystem? The explanation comes later in the paragraph but there needs to be a gentler introduction before diving into specific technical concepts.

      We appreciate this comment and have included a short explanation of what KEGG and KO terms mean.

      Supplementary Figure 3: It would be helpful to superimpose the lake stages here, as done in Figure 2.

      The figure has been updated with coloured data points corresponding to each phase, as in supplementary figure 1.

      Line 265: Should be "19 of which were identified..."

      Done, thanks.

      Line 284: "Predominantly" rather than "prominently"?

      Done

      Line 242-316: This section is good in that it identifies and ranks individual biocides and climate variables but there is no information on non-additive interactions (e.g., synergistic, antagonistic). Could the authors at least comment on why this was not done or not necessary, and what uncertainties this omission could introduce into the results?

      This was a misunderstanding based on our use of the term synergistic in this context. the approach by which we define a synergistic or joint effect of two environmental variables on a taxonomic group is explained in the methods section. This analysis is based on climate variables and biocide types contributing the largest covariances in the correlation analysis explained in Supplementary Fig. 5; Step 4. The combined effect of two environmental variables on a taxon was considered to be significant if the biocide type and the climate variable were each significantly correlated with the taxon over the same time window, and their average Pearson correlation was > 0.5 with padj < 0.05 (SWC analysis with 10,000 permutations) – this is shown in Supplementary Fig. 5; Step 6. The biocide type and the climate variable were interpreted to have an additive effect on a given taxon if the linear combination of the biocide type and the climate variable had a larger Pearson correlation coefficient than each of the correlations between the family and the biocide type and the family and the climate variable individually, in the same time interval with padj < 0.05 (with 10,000 permutations in the SWC analysis). we have replace synergistic with joint effect to avoid confusion.

      Figure 4: These 3-D plots are very hard to read. Without additional features (e.g. shadows on each plane, or lines connecting points to planes) it is impossible for the viewer to tell where the points are located on each axis.

      We have created interactive 3D plots here: https://environmental-omicsgroup.github.io/Biodiversity_Monitoring/.

      Figure 5: Legend entry should be "summer precipitation" not "precipitations". "Additive effect" rather than "joint effect" would be more consistent with the main text.

      “Precipitations” has been updated to “precipitation” where relevant throughout. We left ‘joint effect’ and unified the main text, responding to a previous comment of this reviewer on the meaning of synergistic effects in our study.

      Line 348: Doesn't your approach also require specialist skills? I often feel that the "traditional" versus "molecular" monitoring debate misses this point. Some comment on the training and development needs for those interested in applying the sedaDNA approach would be welcome. Otherwise it is an unfair comparison.

      Whereas the application of high throughput sequencing technologies requires training, these technologies are well established with publicly available standard operating procedures. As compared to direct observations, high throughput sequencing provides replicable results regardless of the operator. Moreover, the application of metabarcoding to sedaDNA or more generally eDNA can be outsourced to established environmental services, removing the need for training if it is a limiting factor. The above has been included in discussion.

      Line 391: "Significantly did" what? "Did significantly change over time" would be better.

      Done, thanks.

      Line 407: Should be "an indicator of..." and "did not significantly change over time..."

      Done, thanks.

      Line 408-410: Regulators are not necessarily interested in identifying past "ecosystem shifts", so this does not seem to be the best way to contrast the capabilities of the sedaDNA approach with those of LTDI2. The real value of this work, in my opinion, is threefold. First, it shows that the reliance on diatoms as indicators of ecological status is inappropriate due to the relatively stable nature of diatom communities in the face of large environmental changes. Second, it presents some better alternatives, including both taxonomic and functional indicators. And third, it provides a new reference point for regulators by characterising "semi-pristine" conditions.

      Thanks for the insightful suggestion. We agree with the reviewer on the advantages and have spelled them out in the resubmitted manuscript.

      Line 445: What are "housekeeping functions"? I checked the Cuenca-Cambronero paper cited but did not find the term there.

      Housekeeping functions are essential basic cellular functions that are evolutionary conserved. They are more commonly present in public databases because they have been characterised in a number of model species (e.g. Drosophila, C. elegans and Mus musculus). Our reference it not to the Cuenca-Cambronero paper, but to Mi et al, describing the reference database PANTHER. We included the definition of housekeeping functions in the main text.

      Line 449: Briefly state the main functional changes found here.

      Examples have been included.

      Lines 451-452: Whilst this statement may be found in the cited source, most readers I suspect would not identify with it. Indeed, one could argue that most of freshwater ecology has been dedicated to this very task (documenting chemical impacts on biodiversity)! A more balanced view is needed here.

      The sentence the reviewer refers to includes also reference to climate change. Climate change and chemical pollution are the two most common causes of biodiversity loss, and not only in freshwater ecosystems.

      Lines 463-466: These examples both point to non-additive (synergistic) effects, which were not assessed in the current study.

      Please refer to our explanation above about the inappropriate use of synergistic and, here, additive. We have altered the text throughout to use joint effects as we do not investigate synergistic, antagonistic and additive effects as traditionally described in ecology.

      Lines 472-474: This sentence is unclear. Do you mean that this approach surpasses others in terms of reliability? If so, I don't believe this has been demonstrated in the paper.

      We apologise. The word ‘reliability’ should have not been in the text. We have improved the clarity of this sentence.

      Lines 474-482: In these sentences it is unclear whether or not you are talking about your method or contrasting it with another method(s). If the latter, which method or methods are you referring to?

      We have fixed this sentence to better reflect that our algorithm provides a high degree of confidence that surpasses state-of-the-art analysis, which predominantly identify patterns of co-occurrence of taxa within communities (e.g. Correlation-Centric Network).

      Line 631: Should be "Physico-chemical variables". I have not extensively checked the rest of the methods for such errors.

      Thank you, the text has been changed where present.

      Reviewer #2 (Recommendations For The Authors):

      Introduction Line 80 remove extra ')'

      Done, thank you.

      Line 81 rephrase e.g includes few freshwater ecosystems

      We modified this sentence also following Reviewer #1

      Line 83 although, instead of whereas?

      Done, thanks.

      Line 106 formatting reference issue

      Line 109 same as above

      Thank you, noted.

      Results

      Line 141 - 144 how was the sampling of the sediment performed over the 100 year core? Every year? Every 5 years? Or were they pooled to represent the (as of yet unlisted) phases?

      The reviewer is correct that details are not provided here. They are in methods. We have added some text to explain the basic concepts of how the core was obtained and sliced and refer the reader to the method section for more details.

      Line 154 the authors have not yet explicitly listed the lake phases, so it is difficult to refer to them now.

      Noted, the addition of a short explanation at the beginning of the results section should take care of this issue.

      Line 216 - may be worth briefly explaining KEGG orthologs and how these relate to functional biodiversity.

      We thank the reviewer. Also responding to a similar comment from Reviewer #1, we included a description of KO terms and their links to functional biodiversity.

      Lines 249 - 260 instead of a supplementary table, it could remain in the main text

      Supplementary table 2 is a multi-tab table including information for each region amplified here. It is not possible to include this table in the main text.

      Materials and Methods Due to the formatting of the manuscript (results & discussion before materials and methods), many of the results are not clearly understood without having to visit the M&M section. Particularly, how the biocide types were obtained (Historic records plus persistence of DDT in sediments). This could be resolved y including a few sentences on how the data was gathered in the results section. Overall, materials and methods are sufficient, however, it is not clear how many of the 37 metabarcoding samples correspond to which of the lake phases. Finally, I suggest a better organization of M&Ms by having subheadings for each section. For example, under Biodiversity fingerprinting across 100 years, one subheading could de DNA extraction and sequencing, another subheading could be bioinformatics.

      We thank the reviewer for the suggestion. To alleviate the issues linked to the methods section coming after the results section, we have introduced a short explanation of the sediments core and the lake phases at the beginning of the results section. A description of the climate and chemical data has been included at the beginning of the section ‘Drivers of biodiversity change’ in results. Subheadings were introduced in methods as suggested.

    1. Author Response

      Reviewer #1 (Public Review):

      .In the best genetically and biochemically understood model of eukaryotic DNA replication, the budding yeast, Saccharomyces cerevisiae, the genomic locations at which DNA replication initiates are determined by a specific sequence motif. These motifs, or ARS elements, are bound by the origin recognition complex (ORC). ORC is required for loading of the initially inactive MCM helicase during origin licensing in G1. In human cells, ORC does not have a specific sequence binding domain and origin specification is not specified by a defined motif. There have thus been great efforts over many years to try to understand the determinants of DNA replication initiation in human cells using a variety of approaches, which have gradually become more refined over time.

      In this manuscript Tian et al. combine data from multiple previous studies using a range of techniques for identifying sites of replication initiation to identify conserved features of replication origins and to examine the relationship between origins and sites of ORC binding in the human genome. The authors identify a) conserved features of replication origins e.g. association with GC-rich sequences, open chromatin, promoters and CTCF binding sites. These associations have already been described in multiple earlier studies. They also examine the relationship of their determined origins and ORC binding sites and conclude that there is no relationship between sites of ORC binding and DNA replication initiation. While the conclusions concerning genomic features of origins are not novel, if true, a clear lack of colocalization of ORC and origins would be a striking finding.

      Thank you. That is where the novelty of the paper lies.

      However, the majority of the datasets used do not report replication origins, but rather broad zones in which replication origins fire. Rather than refining the localisation of origins, the approach of combining diverse methods that monitor different objects related to DNA replication leads to a base dataset that is highly flawed and cannot support the conclusions that are drawn, as explained in more detail below.

      We are using the narrowly defined SNS-seq peaks as the gold standard origins and making sure to focus in on those that fall within the initiation zones defined by other methods. The objective is to make a list of the most reproducible origins. Unlike what the reviewer states, this actually refines the dataset to focus on the SNS origins that have also been reproduced by the other methods in multiple cell lines. We will change the last box of Fig. 1A to say: Identify reproducible SNS-seq origins that are contained in IZs defined by Repli-seq, OK-seq and Bubble-seq. These are the “shared origins”. This and the Fig. 2B (as it is) will make our strategy clearer.

      Methods to determine sites at which DNA replication is initiated can be divided into two groups based on the genomic resolution at which they operate. Techniques such as bubble-seq, ok-seq can localise zones of replication initiation in the range ~50kb. Such zones may contain many replication origins. Conversely, techniques such as SNS-seq and ini-seq can localise replication origins down to less than 1kb. Indeed, the application of these different approaches has led to a degree of controversy in the field about whether human replication does indeed initiate at discrete sites (origins), or whether it initiates randomly in large zones with no recurrent sites being used. However, more recent work has shown that elements of both models are correct i.e. there are recurrent and efficient sites of replication initiation in the human genome, but these tend to be clustered and correspond to the demonstrated initiation zones (Guilbaud et al., 2022).

      These different scales and methodologies are important when considering the approach of Tian et al. The premise that combining all available data from five techniques will increase accuracy and confidence in identifying the most important origins is flawed for two principal reasons. First, as noted above, of the different techniques combined in this manuscript, only SNS-seq can actually identify origins rather than initiation zones. It is the former that matters when comparing sites of ORC binding with replication origin sites if a conclusion is to be drawn that the two do not co-localise.

      Exactly. So the reviewer should agree that our method of finding SNS-seq peaks that fall within initiation zones actually refines the origins to find the most reproducible origins. We are not losing the spatial precision of the SNS-seq peaks.

      Second, the authors give equal weight to all datasets. Certainly, in the case of SNS-seq, this is not appropriate. The technique has evolved over the years and some earlier versions have significantly different technical designs that may impact the reliability and/or resolution of the results e.g. in Foulk et al. (Foulk et al., 2015), lambda exonuclease was added to single stranded DNA from a total genomic preparation rather than purified nascent strands), which may lead to significantly different digestion patterns (ie underdigestion). Curiously, the authors do not make the best use of the largest SNS-seq dataset (Akerman et al., 2020) by ignoring these authors separation of core and stochastic origins. By blending all data together any separation of signal and noise is lost. Further, I am surprised that the authors have chosen not to use data and analysis from a recent study that provides subsets of the most highly used and efficient origins in the human genome, at high resolution (Guilbaud et al., 2022).

      1) We are using the data from Akerman et al., 2020: Dataset GSE128477 in Supplemental Table 1. We can examine the core origins defined by the authors to check its overlap with ORC binding.

      2) To take into account the refinement of the SNS-seq methods through the years, we actually included in our study only those SNS-seq studies after 2018, well after the lambda exonuclease method was introduced. Indeed, all 66 of SNS-seq datasets we used were obtained after the lambda exonuclease digestion step. To reiterate, we recognize that there may be many false positives in the individual origin mapping datasets. Our focus is on the True positives, the SNS-seq peaks that have some support from multiple SNS-seq studies AND fall within the initiation zones defined by the independent means of origin mapping (described in Fig. 1A and 2B). These True positives are most likely to be real and reproducible origins and should be expected to be near ORC binding sites.

      We will change the last box of Fig. 1A to say: Identify reproducible SNS-seq origins that are contained in IZs defined by Repli-seq, OK-seq and Bubble-seq. These are the “Shared origins”.

      Ini-seq by Torsten Krude and co-workers (Guillbaud, 2022) does NOT use Lambda exonuclease digestion. So using Ini-seq defined origins is at odds with the suggestion above that we focus only on SNS-seq datasets that use Lambda exonuclease. However, Ini-seq identifies a much smaller subset of SNS-seq origins, so we will do the analysis with just that smaller set in the revision of the paper.

      References:

      Akerman I, Kasaai B, Bazarova A, Sang PB, Peiffer I, Artufel M, Derelle R, Smith G, Rodriguez-Martinez M, Romano M, Kinet S, Tino P, Theillet C, Taylor N, Ballester B, Méchali M (2020) A predictable conserved DNA base composition signature defines human core DNA replication origins. Nat Commun, 11: 4826

      Foulk MS, Urban JM, Casella C, Gerbi SA (2015) Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res, 25: 725-735

      Guilbaud G, Murat P, Wilkes HS, Lerner LK, Sale JE, Krude T (2022) Determination of human DNA replication origin position and efficiency reveals principles of initiation zone organisation. Nucleic Acids Res, 50: 7436-7450

      Reviewer #2 (Public Review):

      Tian et al. perform a meta-analysis of 113 genome-wide origin profile datasets in humans to assess the reproducibility of experimental techniques and shared genomics features of origins. Techniques to map DNA replication sites have quickly evolved over the last decade, yet little is known about how these methods fare against each other (pros and cons), nor how consistent their maps are. The authors show that high-confidence origins recapitulate several known features of origins (e.g., correspondence with open chromatin, overlap with transcriptional promoters, CTCF binding sites). However, surprisingly, they find little overlap between ORC/MCM binding sites and origin locations.

      Overall, this meta-analysis provides the field with a good assessment of the current state of experimental techniques and their reproducibility, but I am worried about: (a) whether we've learned any new biology from this analysis; (b) how binding sites and origin locations can be so mismatched, in light of numerous studies that suggest otherwise; and (c) some methodological details described below.

      Major comments:

      Line 26: "0.27% were reproducibly detected by four techniques" -- what does this mean? Does the fragment need to be detected by ALL FOUR techniques to be deemed reproducible?

      If the reproducible SNS-seq peaks are included in the reproducible initiation zones found by the other methods, then we consider it reproducible across datasets. The strategy is to focus our analysis on the most reproducible SNS-seq peaks that happen to be in reproducible initiation zones. It is the best way to confidently identify a very small set of true positive origins.

      And what if the technique detected the fragment is only 1 of N experiments conducted; does that count as "detected"?

      A reproducible SNS-seq origin has been reproduced above a statistical threshold of 20 reproductions. A threshold of reproduction in 20 datasets out of 66 SNS-seq datasets gives an FDR of <0.1. This is explained in Fig. 2a and Supplementary Fig. S2. For the initiation zones, we considered a Zone even if it appears in only 1 of N experiments, because N is usually small. This relaxed method for selecting the initiation zones gives the best chance of finding SNS-seq peaks that are reproduced by the other methods.

      Later in Methods, the authors (line 512) say, "shared origins ... occur in sufficient number of samples" but what does sufficient mean?

      Sufficient means that SNS-seq origin was reproducibly detected in ≥ 20 datasets and was included in any initiation zone defined by three other techniques.

      Then on line 522, they use a threshold of "20" samples, which seems arbitrary to me. How are these parameters set, and how robust are the conclusions to these settings? An alternative to setting these (arbitrary) thresholds and discretizing the data is to analyze the data continuously; i.e., associate with each fragment a continuous confidence score.

      We explained Fig. 2a and Supplementary Fig. S2 in the text as follows: The occupancy score of each origin defined by SNS-seq (Supplementary Fig. 2a) counts the frequency at which a given origin is detected in the datasets under consideration. For the random background, we assumed that the number of origins confirmed by increasing occupancy scores decreases exponentially (see Methods and Supplementary Table 2). Plotting the number of origins with various occupancy scores when all SNS-seq datasets published after 2018 are considered together (the union origins) shows that the experimental curve deviates from the random background at a given occupancy score (Fig. 2a). The threshold occupancy score of 20 is the point where the observed number of origins deviates from the expected background number (with an FDR < 0.1) (Fig. 2a). In the Methods: In other words, the number of observed origins with occupancy score greater than 20 is 10 times more than expected in the background model. This approach is statistically sound and described by us in (Fang et al. 2020).

      Line 20: "50,000 origins" vs "7.5M 300bp chromosomal fragments" -- how do these two numbers relate? How many 300bp fragments would be expected given that there are ~50,000 origins? (i.e., how many fragments are there per origin, on average)? This is an important number to report because it gives some sense of how many of these fragments are likely nonsense/noise. The authors might consider eliminating those fragments significantly above the expected number, since their inclusion may muddle biological interpretation.

      I think we confused the reviewer by the way we wrote the abstract. The 50,000 origins that are mentioned in the abstract is the hypothetical expected number of origins that have to fire to replicate the whole 6x10^9 base diploid genome based on the average inter-origin distance of 10^5 bases (as determined by molecular combing). The 7.5M 300 bp fragments are the genomic regions where the 7.5M union SNS-seq-defined origins are located. Clearly, that is a lot of noise, some because of technical noise and some due to the fact that origins fire stochastically. Which is why our paper focuses on a smaller number of reproducible origins, the 20,250 shared origins. Our analysis is on the 20,250 shared origins, and not on all 7.5M union origins. Thus, we are not including the excess of non-reproducible (stochastic?) origins in our analysis.

      The revised abstract in the revised paper will say: “Based on experimentally determined average inter-origin distances of ~100 kb, DNA replication initiates from ~50,000 origins on human chromosomes in each cell-cycle. The origins are believed to be specified by binding of factors like the Origin Recognition Complex (ORC) or CTCF or other features like G-quadruplexes. We have performed an integrative analysis of 113 genome-wide human origin profiles (from five different techniques) and 5 ORC-binding site datasets to critically evaluate whether the most reproducible origins are specified by these features. Out of ~7.5 million union origins identified by 66 SNS-seq datasets, only 0.27% were reproducibly contained in initiation zones identified by three other techniques (20,250 shared origins), suggesting extensive variability in origin usage and identification in different circumstances.”

      Line 143: I'm not terribly convinced by the PCA clustering analysis, since the variance explained by the first 2 PCs is only ~25%. A more robust analysis of whether origins cluster by cell type, year etc is to simply compute the distribution of pairwise correlations of origin profiles within the same group (cell type, year) vs the correlation distribution between groups. Relatedly, the authors should explain what an "origin profile" is (line 141). Is the matrix (to which PCA is applied) of size 7.5M x 113, with a "1" in the (i,j) position if the ith fragment was detected in the jth dataset?

      The reviewer is correct about how we did the PCA and have now included the description in the Methods. We will also do the pairwise correlations the way the reviewer suggests (a) by techniques, (b) by cell types (SNS-seq), (c) by year of publication (SNS-seq).

      It's not clear to me what new biology (genomic features) has been learned from this meta-analysis. All the major genomic features analyzed have already been found to be associated with origin sites. For example, the correspondence with TSS has been reported before:

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6320713/

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547456/

      So what new biology has been discovered from this meta-analysis?

      The new biology can be summarized as: (a) We can identify a set of reproducible (in multiple datasets and in multiple cell lines) SNS-seq origins that also fall within initiation zones identified by completely independent methods. These may be the best origins to study in the midst of the noise created by stochastic origin firing. (b) The overlap of these True Positive origins with known ORC binding sites is tenuous. So either all the origin mapping data, or all the ORC binding data has to be discarded, or this is the new biological reality in mammalian cancer cells: on a genome-wide scale the most reproduced origins are not in close proximity to ORC binding sites, in contrast to the situation in yeast. (c) All the features that have been reported to define origins (CTCF binding sites, G quadruplexes etc.) could simply be from the fact that those features also define transcription start sites (TSS), and origins prefer to be near TSS because of the favorable chromatin state.

      Line 250: The most surprising finding is that there is little overlap between ORC/MCM binding sites and origin locations. The authors speculate that the overlap between ORC1 and ORC2 could be low because they come from different cell types. Equally concerning is the lack of overlap with MCM. If true, these are potentially major discoveries that butts heads with numerous other studies that have suggested otherwise. More needs to be done to convince the reader that such a mis-match is true. Some ideas are below:

      Idea 1) One explanation given is that the ORC1 and ORC2 data come from different cell types. But there must be a dataset where both are mapped in the same cell type. Can the authors check the overlap here? In Fig S4A, I would expect the circles to not only strongly overlap but to also be of roughly the same size, since both ORC's are required in the complex. So something seems off here.

      We agree with the reviewer that there is something “off here”. Either the techniques that report these sites are all wrong, or the biology does not fit into the prevailing hypothesis. One secret in the ORC ChIP field that our lab has struggled with for quite some time is that the various ORC subunits do not necessarily ChiP-seq to the same sites. The poor overlap between the binding sites of subunits of the same complex either suggests that the subunits do not always bind to the chromatin as a six-subunit complex or that all the ChIP-seq data in the Literature is suspect. We provide in the supplementary figure S4A examples of true positive complexes (SMARCA4/ARID1A, SMC1A/SMC3, EZH2/SUZ12), whose subunits ChIP-seq to a large fraction of common sites. As shown in Supplementary Fig. S4C, we do not have ORC1 and ORC2 ChIP-seq data from the same cell-type. We have ORC1 ChIP-seq and SNS-seq data from HeLa cells and ORC2 ChIP seq and origins from K562 cells, and so will add the proximity/overlap of the binding sites to the origins in the same cell-type in the revision.

      Idea 2) Another explanation given is that origins fire stochastically. One way to quantify the role of stochasticity is to quantify the overlap of origin locations performed by the same lab, in the same year, in the same experiment, in the same cell type -- i.e., across replicates -- and then compute the overlap of mapped origins. This would quantify how much mis-match is truly due to stochasticity, and how much may be due to other factors.

      A given lab may have superior reproducibility compared to the entire field. But the notion of stochasticity is well accepted in the field because of this observation: the average inter-origin distance measured by single molecule techniques like molecular combing is ~100 kb, but the average inter-origin distance measure on a population of cells (same cell line) is ~30 kb. The only explanation is that in a population of cells many origins can fire, but in a given cell on a given allele, only one-third of those possible origins fire. This is why we did not worry about the lack of reproducibility between cell-lines, labs etc, but instead focused on those SNS-seq origins that are reproducible over multiple techniques and cell lines.

      Idea 3) A third explanation is that MCMs are loaded further from origin sites in human than in yeast. Is there any evidence of this? How far away does the evidence suggest, and what if this distance is used to define proximity?

      MCMs, of course, have to be loaded at an origin at the time the origin fires because MCMs provide the core of the helicase that starts unwinding the DNA at the origin. Thus, the lack of proximity of MCM binding sites with origins can be because the most detected MCM sites (where MCM spends the most time in a cell-population) does not correspond to where it is first active to initiate origin firing. This has been discussed. MCMs may be loaded far from origin site, but because of their ability to move along the chromatin, they have to move to the origin-site at some point to fire the origin.

      Idea 4) How many individual datasets (i.e., those collected and published together) also demonstrate the feature that ORC/MCM binding locations do not correlate with origins? If there are few, then indeed, the integrative analysis performed here is consistent. But if there are many, then why would individual datasets reveal one thing, but integrative analysis reveal something else?

      We apologize for this oversight. In the revised manuscript we will discuss PMC3530669, PMC7993996, PMC5389698, PMC10366126. None of them have addressed what we are addressing, which is whether the small subset of the most reproducible origins proximal to ORC or MCM binding sites, but the discussion is essential.

      Idea 5) What if you were much more restrictive when defining "high-confidence" origins / binding sites. Does the overlap between origins and binding sites go up with increasing restriction?

      We will make origins more restrictive by selecting those reproduced by 30-60 datasets. The number of origins will of course fall, but we will measure whether the proximity to ORC or MCM-binding sites increases/decreases in a statistically rigorous way.

      Overall, I have the sense that these experimental techniques may be producing a lot of junk. If true, this would be useful for the field to know! But if not, and there are indeed "unexplored mechanisms of origin specification" that would be exciting. But I'm not convinced yet.

      It would be nice in the Discussion for the authors to comment about the trade-offs of different techniques; what are their pros and cons, which should be used when, which should be avoided altogether, and why? This would be a valuable prescription for the field.

      Thanks for the suggestion. We will do what the reviewer suggests: use cell type-specific data wherever origins have been defined by at least two methods in the same cell type, specifically reporting the percent of shared origins amongst the datasets to compare whether some methods correlate better with each other. ORC ChIP-seq and MCM ChIP-seq data do not define origins: they define the binding sites of these proteins. Thus we will discuss why the ChIP-seq sites of these protein complexes should not be used to define origins.

      Reviewer #3 (Public Review):

      Summary: The authors present a thought-provoking and comprehensive re-analysis of previously published human cell genomics data that seeks to understand the relationship between the sites where the Origin Recognition Complex (ORC) binds chromatin, where the replicative helicase (Mcm2-7) is situated on chromatin, and where DNA replication actually beings (origins). The view that these should coincide is influenced by studies in yeast where ORC binds site-specifically to dedicated nucleosome-free origins where Mcm2-7 can be loaded and remains stably positioned for subsequent replication initiation. However, this is most certainly not the case in metazoans where it has already been reported that chromatin bindings sites of ORC, Mcm2-7, and origins do not necessarily overlap, likely because ORC loads the helicase in transcriptionally active regions of the genome and, since Mcm2-7 retains linear mobility (i.e., it can slide), it is displaced from its original position by other chromatin-contextualized processes (for example, see Gros et al., 2015 Mol Cell, Powell et al., 2015 EMBO J, Miotto et al., 2016 PNAS, and Prioleau et al., 2016 G&D amongst others). This study reaches a very similar conclusion: in short, they find a high degree of discordance between ORC, Mcm2-7, and origin positions in human cells.

      Strengths: The strength of this work is its comprehensive and unbiased analysis of all relevant genomics datasets. To my knowledge, this is the first attempt to integrate these observations and the analyses employed were suited for the questions under consideration.

      Thank you for recognizing the comprehensive and unbiased nature of our analysis. The fact that the major weakness is that the comprehensive view fails to move the field forward, is actually a strength. It should be viewed in the light that we cannot even find evidence to support the primary hypothesis: that the most reproducible origins must be near ORC and MCM binding sites. This finding will prevent the unwise adoption of ORC or MCM binding sites as surrogate markers of origins and may perhaps stimulate the field to try and improve methods of identifying ORC or MCM binding until the binding sites are found to be proximal to the most reproducible origins. The last possibility is that there are ORC- or MCM-independent modes of defining origins, but we have no evidence of that.

      Weaknesses: The major weakness of this paper is that this comprehensive view failed to move the field forward from what was already known. Further, a substantial body of relevant prior genomics literature on the subject was neither cited nor discussed. This omission is important given that this group reaches very similar conclusions as studies published a number of years ago. Further, their study seems to present a unique opportunity to evaluate and shape our confidence in the different genomics techniques compared in this study. This, however, was also not discussed.

      We will do what the reviewer suggests: use cell type-specific data wherever origins have been defined by at least two methods in the same cell type, specifically reporting the percent of shared origins amongst the datasets to compare whether some methods correlate better with each other. Thanks for the suggestion. ORC ChIP-seq and MCM ChIP-seq data do not define origins: they define the binding sites of these proteins. Thus, we will discuss why the ChIP-seq sites of these protein complexes should not be used to define origins.

      We do not cite the SNS-seq data before 2018 because of the concerns discussed above about the earlier techniques needing improvement. We will discuss other genomics data that we failed to discuss.

      We will cite the papers the reviewer names:

      Gros, Mol Cell 2015 and Powell, EMBO J. 2015 discuss the movement of MCM2-7 away from ORC in yeast and fliesand will be cited. MCM2-7 binding to sites away from ORC and being loaded in vast excess of ORC was reported earlier on Xenopus chromatin in PMC193934, and will also be cited.

      Miotto, PNAS, 2016: publishes ORC2 ChIP-seq sites in HeLa (data we have used in our analysis), but do not measure ORC1 ChIP-seq sites. They say: “ORC1 and ORC2 recognize similar chromatin states and hence are likely to have similar binding profiles.” This is a conclusion based on the fact that the ChIP seq sites in the two studies are in areas with open chromatin, it is not a direct comparison of binding sites of the two proteins.

      Prioleau, G&D, 2016: This is a review that compared different techniques of origin identification but has no primary data to say that ORC and MCM binding sites overlap with the most reproducible origins.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study investigates the context-specificity of facial expressions in three species of macaques to test predictions for the 'social complexity hypothesis for communicative complexity'. This hypothesis has garnered much attention in recent years. A proper test of this hypothesis requires clear definitions of 'communicative complexity' and 'social complexity'. Importantly, these two facets of a society must not be derived from the same data because otherwise, any link between the two would be trivial. For instance, if social complexity is derived from the types of interactions individuals have, and different types of signals accompany these interactions, we would not learn anything from a correlation between social and communicative complexity, as both stem from the same data.

      The authors of the present paper make a big step forward in operationalising communicative complexity. They used the Facial Action Coding System to code a large number of facial expressions in macaques. This system allows decomposing facial expressions into different action units, such as 'upper lid raiser', 'upper lip raiser' etc.; these units are closely linked to activating specific muscles or muscle groups. Based on these data, the authors calculated three measures derived from information theory: entropy, specificity and prediction error. These parts of the analysis will be useful for future studies.

      The three species of macaque varied in these three dimensions. In terms of entropy, there were differences with regard to context (and if there are these context-specific differences, then why pool the data?). Barbary and Tonkean macaques showed lower specificity than rhesus macaques. Regarding predicting context from the facial signals, a random forest classifier yielded the highest prediction values for rhesus monkeys. These results align with an earlier study by Preuschoft and van Schaik (2000), who found that less despotic species have greater variability in facial expressions and usage.

      Crucially, the three species under study are also known to vary in terms of their social tolerance. According to the highly influential framework proposed by Bernard Thierry, the members of the genus Macaca fall along a graded continuum from despotic (grade 1) to highly tolerant (grade 4). The three species chosen for the present study represent grade 1 (rhesus monkeys), grade 3 (Barbary macaques), and grade 4 (Tonkean macaques).

      The authors of the present paper define social complexity as equivalent to social tolerance - but how is social tolerance defined? Thierry used aggression and conflict resolution patterns to classify the different macaque species, with the steepness of the rank hierarchy and the degree of nepotism (kin bias) being essential. However, aggression and conflict resolution are accompanied by facial gestures. Thus, the authors are looking at two sides of the same coin when investigating the link between social complexity (as defined by the authors) and communicative complexity. Therefore, I am not convinced that this study makes a significant advance in testing the social complexity for communicative complexity hypothesis. A further weakness is that - despite the careful analysis - only three species were considered; thus, the effective sample size is very small.

      Social tolerance in macaques is defined by various covarying traits, among which rates of counter-aggression and conflict resolution are only two of many included (see Thierry 2021 for a recent discussion and review). We do not deviate from Thierry’s definition of social tolerance. We simply highlight that the constellation of behavioral traits in the most tolerant macaque species results in a social environment where the outcome of social interactions is more uncertain (see introduction lines 102-114). As we argue throughout the paper, higher uncertainty can be used as a proxy for higher complexity and thus we conclude that the most tolerant macaque species have the highest social complexity. While most social behavior in macaques is accompanied by some facial behavior, we were careful to define social contexts only from the body language/behavior (e.g., lunge for aggression, grooming for affiliation) of the individuals involved and ignored the facial behavior used (see method lines 371-381). Therefore, the facial behavior of macaques (communication signals) was not used in defining either social tolerance (and by extension complexity) or the social context in which it was used. We feel like this appropriately minimizes any elements of circularity in the analysis of social and communicative complexity.

      Regarding the effective sample size of three species, we agree that it is small, and it is a limitation of this study. However, the methodology we used is applicable to any species for which FACS is available (including other non-human primates, dogs, and horses), and therefore, we hope that other datasets will complement ours in the future. Nevertheless, we now acknowledge this limitation in the discussion (lines 314317).

      Reviewer #2 (Public Review):

      This is a well-written manuscript about a strong comparative study of diversity of facial movements in three macaque species to test arguments about social complexity influencing communicative complexity. My major criticism has to do with the lack of any reporting of inter-observer reliability statistics - see comment below. Reporting high levels of inter-observer reliability is crucial for making clear the authors have minimized chances of possible observer biases in a study like this, where it is not possible to code the data blind with regard to comparison group. My other comments and questions follow by line number:

      We agree that inter-observer coding reliability is an important piece of information. We now report in more detail the inter-observer reliability tests that we conducted on lines 384-392.

      38-40. Whereas I am an advocate of this hypothesis and have tested it myself, the authors should probably comment here, or later in the discussion, about the reverse argument - greater communicative complexity (driven by other selection pressures) could make more complicated social structures possible. This latter view was the one advocated by McComb & Semple in their foundational 2005 Biology Letters comparative study of relationships between vocal repertoire size and typical group size in non-human primate species.

      It is true that an increase in communicative complexity could allow/drive an increase in social complexity. Unfortunately our data is correlational in nature and we cannot determine the direction of causality. We added such a statement to the discussion (lines 311-314).

      72-84 and 95-96. In the paragraph here, the authors outline an argument about increasing uncertainty / entropy mapping on to increasing complexity in a system (social or communicative). In lines 95-96, though, they fall back on the standard argument about complex systems having intermediate levels of uncertainty (complete uncertainty roughly = random and complete certainty roughly = simple). Various authors have put forward what I think are useful ways of thinking about complexity in groups - from the perspective of an insider (i.e., a group member, where greater randomness is, in fact, greater complexity) vs from the perspective of an outside (i.e., a researcher trying to quantify the complexity of the system where is it relatively easy to explain a completely predictable or completely random system but harder to do so for an intermediately ordered or random system). This sort of argument (Andrew Whiten had an early paper that made this argument) might be worth raising here or later in the discussion? (I'm also curious where the authors sentiments lie for this question - they seem to touch on it in lines 285-287, but I think it's worth unpacking a little more here!)

      In this study we used three measures of uncertainty (entropy, context specificity, and prediction error) to approximate complexity. However, maximum entropy or uncertainty would be achieved in a system that is completely random (and thus be considered simple). Therefore, the species with the highest entropy values, or unpredictability, could be interpreted as having a simpler communication system than a species with a moderately high entropy/unpredictability value. Our argument is that animal communication systems cannot possibly be random, otherwise they would not have evolved as signals. In systems where we know the highest entropy (or unpredictability) will not be due to randomness, as is the case with animal social interactions and communication, we can conclude that the system with the highest uncertainty is the most complex. We have now expanded upon this point in the discussion (lines 286-294). See also response to reviewer 1 below.

      115-129. See also:

      Maestripieri, D. (2005). "Gestural communication in three species of macaques (Macaca mulatta, M. nemestrina, M. arctoides): use of signals in relation to dominance and social context." Gesture 5: 57-73.

      Maestripieri, D. and K. Wallen (1997). "Affiliative and submissive communication in rhesus macaques." Primates 38(2): 127-138.

      On that note, it is probably worth discussing in this paragraph and probably later in the discussion exactly how this study differs from these earlier studies of Maestripieri. I think the fact that machine learning approaches had the most difficulty assigning crested data to context is an important methodological advance for addressing these sorts of questions - there are probably other important differences between the authors' study here and these older publications that are worth bringing up.

      Our study differs from these two studies in that the studies above classified facial behavior into discrete categories (e.g., bared-teeth, lip-smack), whereas we adopted a bottom-up approach and made no a priori assumptions about which movements are relevant. We broke down facial behavior down to their individual muscle movements (i.e., Action Units). Measuring facial behavior at the level of individual muscle movements allows for a more detailed and objective description of the complexity of facial behavior. This is a general point in advancing the study of facial behavior that is discussed in the introduction (lines 60-71) and discussion (lines 206-208). The reason we don’t draw a direct comparison with the studies above is because they had a slightly different focus. Our study was more focused on complexity of the (facial) communication system in general rather than comparing whether the different species use the same facial behavior in the same/different social contexts.

      220-222. What is known about visual perception in these species? Recent arguments suggest that more socially complex species should have more sensitive perceptual processing abilities for other individuals' signals and cues (see Freeberg et al. 2019 Animal Behaviour). Are there any published empirical data to this effect, ideally from the visual domain but perhaps from any domain?

      This is an interesting point. We are not aware of any studies showing differences in visual perceptions within the macaque genus. Both crested macaques and rhesus macaques are able to discriminate between individuals and facial expressions in match-to-sample tasks with comparable performances (Micheletta et al., 2015a, 2015b; Parr et al. 2008; Parr & Heinz, 2009). Similarly, several macaque species are sensitive to gaze shifts from conspecifics (Tomasello et al. 1998; Teufel et al. 2010; Micheletta & Waller, 2012).

      274-277. I am not sure I follow this - could not different social and non-social contexts produce variation in different affective states such that "emotion"-based signals could be as flexible / uncertain as seemingly volitional / information-based / referential-like signals? This issue is probably too far away from the main points of this paper, but I suspect the authors' argument in this sentence is too simplified or overstated with regard to more affect-based signals.

      Emotion-based signals could, in theory, also produce flexible signals and it is possible that some facial expressions reflect an emotional state. However, some previous studies have suggested that facial expressions are only used as a display of emotion, rather than such signals having evolved for a different function such as announcing future intentions. In our study we found that macaques used, in some cases, the same facial expressions (i.e. combination of Action Units) in at least two different social contexts that, presumably, differed in their emotional valence. Thus, it is unlikely that particular facial expressions are bound to a single emotion. We think that this is an important point to make even though it is slightly beyond the scope of our paper.

      288 on. Given there are only three species in this study, the chances of one of the species being the 'most complex' in any measure is 0.33. Although I do not believe this argument I am making here, can the authors rule out the possibility that their findings related to crested macaques are all related to chance, statistically speaking?

      We are not aware of a way to rule out this possibility. However, we believe that we are appropriately cautious throughout the paper and acknowledge that having only investigated three species is a limitation of this study in the discussion (lines 314-317, see also our response to reviewer 1 above).

      329-330. The fact that only one male rhesus macaque was assessed here seems problematic, given the balance of sexes in the other two species. Can the authors comment more on this - are the gestures they are studying here identical across the sexes?

      We agree it would have been preferable to collect data on more than one male rhesus macaque, but that was unfortunately not possible. We are not aware of any studies showing differences in the use of facial behavior between male and female rhesus macaques. If differences exist, most likely these would occur in a sexual/mating context. However, in our study we only considered affiliative (non-sexual), submissive, and aggressive contexts, where we have no a priori reason to believe that there are sex differences.

      354-371. Inter-observer reliability statistics are required here - one of the authors who did not code the original data set, or a trained observer who is not an author, could easily code a subset of the video files to obtain inter-observer reliability data. This is important for ruling out potential unconscious observer biases in coding the data.

      We agree this is an important piece of information. We now report in more detail the inter-observer reliability tests that we conducted on lines 384-392:

      “An agreement rating of >0.7 was considered good [Ekman et al 2002] and was necessary for obtaining certification. To obtain a MaqFACS coding certification, AVR, CP, and PRC coded 23 video clips of rhesus macaques and the MaqFACS codes were compared to the data of other certified coders (https://animalfacs.com).

      The mean agreement ratings obtained were 0.85, 0.73, 0.83 for AVR, CP, and PRC, respectively. In addition, AVR and CP coded 7 videos of Barbary macaques with a mean agreement rating of 0.79. AVR and PRC coded 10 videos of crested macaques with a mean agreement rating of 0.74.”

      Reviewer #1 (Recommendations For The Authors):

      Given the long debate on the concept of information exchange in animal communication, I would also recommend being more careful with the term 'exchanges of information' (line 271). Perhaps it's better to be agnostic in the context of this paper.

      As suggested, we now changed the phrasing to focus on the behavior of the animals, rather than suggesting that information is being exchanged (lines 270-273),

      Line 281: "This result confirms the assumption that facial behaviour in macaques is not used randomly": the authors are knocking down a straw man. Nobody who has ever studied animal communication would consider that signals occur randomly. Otherwise, they would not have evolved as signals.

      Indeed, nobody claims that animal communication signals are used randomly. Although it may be taken for granted, we feel it is worthwhile to reiterate this point, given that we used relative entropy and prediction error as measures of complexity. For instance, maximum entropy or unpredictability would be achieved in a system that is completely random (and thus be considered simple). Therefore, the species with the highest entropy values, or lowest predictability, could be interpreted as having a simpler communication system than a species with a moderately high entropy value. But if we are working under the assumption that animal communication systems cannot possibly be random, then we can conclude that the species whose communication system has the highest entropy is in fact the most complex. We tried to make this justification clearer in the discussion (lines 285-294).

      I did not follow why there is a higher reliance on facial signals when predation pressure is higher. Apart from the fact that the authors cannot address this question, they may want to reconsider this idea altogether.

      We now expand on the logic of why predation pressure might affect the use of facial signals (see lines 308-309): “When predation pressure is higher, reliance on facial signals could be higher than, for example vocal signals, such as to not draw attention of predators to the signaller.”

      Technical comments:

      One methodological issue that requires clarification is what the units of analysis are. The authors write that each row in their analysis denoted an observation time of 500 ms. How many rows did the authors assemble? The authors mention a sample size of > 3000 social interactions in the abstract. How did they define social interactions? And how many 'time windows' of 500 ms were obtained? Did they take one window per interaction or several? If several, then how was this move accounted for in the analysis? The reporting needs to be more accurate here. Most likely, the bootstrapping took care of biases in the data, but still, this information needs to be provided.

      We have now added some additional information to the method section. Social interactions for each context had the following definitions: “Social context was labeled from the point of view of the signaler based on their general behavior and body language (but not the facial behavior itself), during or immediately following the facial behavior. An aggressive context was considered when the signaler lunged or leaned forward with the body or head, charged, chased, or physically hit the interaction partner. A submissive context was considered when the signaler leaned back with the body or head, moved away, or fled from the interaction partner. An affiliative context was considered when the signaler approached another individual without aggression (as defined previously) and remained in proximity, in relaxed body contact, or groomed either during or immediately after the facial behavior. In cases where the behavior of the signaler did not match our context definitions, or displayed behaviors belonging to multiple contexts, we labeled the social context as unclear. Social context was determined from the video itself and/or from the matching focal behavioral data, if available.” (lines 371-382). The total duration of all social interactions per social context, and thus the number of 500ms windows/rows, have been added to Table 1 (lines 395-397). There were several 500ms windows per social interaction. All 500ms time blocks per interaction were used in the statistical analyses in order to retain all the variation and complexity of the facial behavior (Action Unit combinations) used by the macaques (lines 403-405). Indeed the bootstrapping procedure was used to account for any biases in the data.

      Overall, I would recommend providing more information on the actual behaviour of the animals. The paper is strong in handling highly derived indices representing the behaviour, but the reader learns little about the animals' behaviour. Thus, it would be great if statements about the entropy ratio were translated into what these measures represent in real life. For context specificity, this is clear, but for entropy, not so much.

      A high entropy ratio essentially suggests that a species uses a high variety of unique facial behavior/signals and all signals in the repertoire are used roughly equally often (rather than one facial behavior being used 90% of the time and others rarely used). We have tried our best to better explain this point in the introduction (lines 75-81) and discussion (lines 215-222). Discussing exactly what these signals are and what they mean was beyond the scope of this paper.

      Line 106: nepotism, not kinship

      Changed as suggested (line 106).

      Line 113: I would avoid statements about how a monkey society is perceived by its members.

      We think that noting how individuals may perceive their social environment is worthwhile when defining social complexity, so have retained this point but changed the phrasing to be more speculative (lines 112-113).

      Line 329: I was very surprised that only one male was represented in the data for rhesus monkeys. The authors try to wriggle their way out of this issue in the supplementary material ("Therefore, we have no a priori reason to expect an overall difference in the diversity and complexity of facial behaviour between the sexes"), but I think this is a major shortcoming of the analysis. They should ascertain whether there are no sex differences in the other two species regarding their variables of interest. They could then make a very cautious case for there being no sex differences in rhesus either. But of course, they would not know for sure.

      As with our response to reviewer 2 above, we agree that it would have been preferable to collect data on more than one male rhesus macaque, but that was unfortunately not possible. We are not aware of any studies showing differences in the use of facial behavior between male and female rhesus macaques. If differences exist, most likely these would occur in a sexual/mating context. However, in our study we only considered affiliative (non-sexual), submissive, and aggressive contexts, where we have no a priori reason to believe that there are sex differences. Looking at sex differences in the use of facial behavior would be a worthwhile study on its own, but it is outside the scope of this paper.

      This paper would make a stronger contribution if it focussed on the comparative analysis of facial expressions and removed the attempt of testing the social complexity for communicative complexity hypothesis.

      A comparative analysis of the contextual use of specific facial movements is important. But this paper is focused on making a more general comparison of the communication style and complexity across species. The social complexity hypothesis for communicative complexity is one of the key theoretical frameworks for such an investigation and allows us to frame our study in a broader context. We contribute important data on 3 species with methods that can be replicated and extended to others species. Therefore, we believe that it is a worthy contribution to investigations of the evolution of complex communication.

      REFERENCES

      Micheletta, J., J. Whitehouse, L.A. Parr, and B.M. Waller. ‘Facial Expression Recognition in Crested Macaques (Macaca nigra)’. Animal Cognition 18 (2015): 985–90. https://doi.org/10/f7fvnh.

      Micheletta, Jérôme, Jamie Whitehouse, Lisa A. Parr, Paul Marshman, Antje Engelhardt, and Bridget M. Waller. ‘Familiar and Unfamiliar Face Recognition in Crested Macaques (Macaca nigra)’. Royal Society Open Science 2 (2015): 150109. https://doi.org/10/ggx9k9.

      Parr, L. A., and M. Heintz. ‘Facial Expression Recognition in Rhesus Monkeys, Macaca mulatta’. Animal Behaviour 77 (2009): 1507–13. https://doi.org/10/bbsp5n.

      Parr, L.A., M. Heintz, and G. Pradhan. ‘Rhesus Monkeys (Macaca mulatta) Lack Expertise in Face Processing’. Journal of Comparative Psychology 122 (2008): 390–402. https://doi.org/10/d7w6bv.

      Micheletta, J., and B.M. Waller. ‘Friendship Affects Gaze Following in a Tolerant Species of Macaque, Macaca nigra’. Animal Behaviour 83 (2012): 459–67. https://doi.org/10/c4f8n2.

      Thierry B. Where do we stand with the covariation framework in primate societies? Am. J. Biol. Anthropol. 128 (2021): 5–25. https://doi.org/10.1002/ajpa.24441

      Tomasello, M., J. Call, and B. Hare. ‘Five Primate Species Follow the Visual Gaze of Conspecifics’. Animal Behaviour 55 (1998): 1063–69. https://doi.org/10/bmq7xh.

      Teufel, C., A. Gutmann, R. Pirow, and J. Fischer. ‘Facial Expressions Modulate the Ontogenetic Trajectory of Gaze-Following among Monkeys’. Developmental Science 13 (2010): 913–22. https://doi.org/10/b6j5r7.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful for the helpful comments of both reviewers and have revised our manuscript with them in mind.

      One of the main issues raised was that readers may by default assume that our models are correct. We in fact made it very clear in our discussion that the models are merely hypotheses that will need testing by “wet” experiments and we do not therefore agree that even readers unfamiliar with AF would assume that the models must be correct. It was also suggested that readers could be reassured by including extensive confidence estimates such as PAE plots. As it happens, every single model described in the manuscript had reasonably high PAE scores and more crucially the entire collection of output files, including PAE data, are readily accessible on Figshare at https://doi.org/10.6084/m9.figshare.22567318.v2, a fact that the reviewers appear to have overlooked. The Figshare link is mentioned three times in the manuscript. Embedding these data within the manuscript itself would in our view add even more details and we have therefore not included them in our revised manuscript. Likewise, it is rather simple for any reader to work out which part of a PAE matrix corresponds to an interaction observed in the corresponding pdb prediction. Besides which, it is our view that the biological plausibility and explanatory power of models is just as important as AF metrics in judging whether they may be correct, as is indeed also the case for most experimental work.

      Another important point was that the manuscript was too long and not readable. Yes, it is long and it could well be argued that we could have written a different type of manuscript, focusing entirely on what is possibly the simplest and most important finding, namely that our AF models suggest that in animal cells Wapl appears to form a quarternary complex with SA, Pds5, and Scc1 in a manner suggesting that a key function of Wapl’s conserved CTD is to sequester Scc1’s Nterminal domain after it has dissociated from Smc3. For right or for wrong, we decided that this story could not be presented on its own but also required 1) an explanation for how Scc1 is induced to dissociate from Smc3 in the first place and 2) how to explain that the quarternary complex predicted for animal cells was not initially predicted for fungi such as yeast. The yeast situation was an exception that clearly needed explaining if the theory was to have any generality and it turned out that delving into the intricate details of the genetics of releasing activity in yeast was eventually required and yielded valuable new insights. We also believe that our work on the recruitment of Eco/Esco acetyl transferases to cohesin and the finding that sororin binds to the Smc3/Scc1 interface also provided important insight into how releasing activity is regulated. We acknowledge that the paper is indeed long but do not think that it is badly written. It is above all a long and complex story that in our view reveals numerous novel insights into how cohesin’s association with chromosomes is regulated and have endeavoured to eliminate any excessive speculation. We feel it is not our fault that cohesin uses complex mechanisms.

      Notwithstanding these considerations, we have in fact simplified a few sections and removed one or two others but acknowledge that we have not made substantial cuts.

      It was pointed out that a key feature of our modelling, namely the predicted association of Wapl’s C-terminal domain with SA/Scc3’s CES is inconsistent with published biochemical data. The AF predictions for this interface are universally robust in all eukaryotic lineages and crucially fully consistent with published and unimpeachable genetic data. We note that any model that explains all findings is bound to be wrong for the very simple reason that some of these findings will prove to be incorrect. There is therefore an art in Science of judging which data must be explained and accommodated and which should be ignored. In this particular case, we chose to ignore the biochemistry. Time will tell whether our judgement proves correct.

      Last but not least, it was suggested that we might provide some experimental support for our proposed SA/Scc3-Pds5-Scc1-WaplC quaternary complex. We are in fact working on this by introducing cysteine pairs (that can be crosslinked in cells) into the proposed interfaces but decided that such studies should be the topic of a subsequent publication. It would be impossible with the resources available to our labs to follow up all of the potential interactions and we therefore decided to exclude all such experiments.

      We are grateful for the detailed comments provided by both reviewers, many of which were very helpful, and in many but not all cases have amended the manuscript accordingly.

      With regard to the more specific comments:

      Reviewer #1 (Recommendations For The Authors):

      1) One concern is that observed interfaces/complexes arise because AF-multimer will aim to pack exposed, conserved and hydrophobic surfaces or regions that contain charge complementarity. The risk is that pairwise interaction screens can result in false positive & non-physiological interactions. It is therefore important to report the level of model confidence obtained for such AF calculations:

      A) The authors should color the key models according to pLDDT scores obtained as reported by AF. This would allow the reader to judge the estimated accuracy of the backbone and side chain rotamers obtained. At least for the key models and interactions it would be important to know if the pLDDT score is >90 (Correct backbone and most rotamers) or >70 (only backbone is correct).

      B) It would also be important to report the PAE plots to allow estimation of the expected position error for most of the important interactions. pLDDT coloring and PEA plots can be shown side-by-side as shown in other published data (e.g. https://pubmed.ncbi.nlm.nih.gov/35679397/ (Supplementary data)

      C) The authors should include a Table showing the confidence of template modeling scores for the predicted protein interfaces as ipTM, ipTM+pTM as reported by AlphaFold-multimer. Ideally, they would also include DockQ scores but this may not be essential. Addition of such scores would help classification into Incorrect, Acceptable or of high quality. For example, line 1073 et seq the authors show a model of a SCC1SA and ESCO1 complex (Fig. 37). Are the modeling scores for these interfaces high? It does not help that the authors show cartoons without side chains? Can the authors provide a close-up view of the two interfaces? Are the amino acids are indeed packed in a manner expected for a protein interface? Can we exclude the possibility that the prediction is obtained merely because the sequence segments (e.g. in ESCO1 & ESCO2) are hydrophobic and conserved?

      We do not agree that including this level of detail to the text/figures of the manuscript would be suitable. All the relevant data for those who may be sceptical about the models are readily available at https://doi.org/10.6084/m9.figshare.22567318.v2. In our view, the cartoon versions of the models are easier for a reader to navigate. Anyone interested in the molecular details can look at the models directly.

      Importantly, no amount of statistical analysis can completely validate these models. What is required are further experiments, which will be the topic of further work from our and I dare from other laboratories.

      D) When they predict an interaction between the SA2:SCC1 complex and Sororin's FGF motif, they find that only 1/5 models show an interaction and that the interaction is dissimilar to that seen of CTCF. Again, it would be helpful to know about modeling scores. Can they show a close-up view of the SORORIN FGF binding interface to see if a realistic binding mode is obtained? Can they indicate the relevant region on the PAE plot?

      Given that AF greatly favours other interactions of Sororin’s FGF motif over its interaction with SA2-Scc1, we do not agree that dwelling on the latter would serve any purpose.

      2) Line 996: AF predicts with high confidence an interaction between Eco1 & SMC3hd. What are the ipTM (& DockQ if available) scores. Would the interface score High, Medium or Acceptable?

      As mentioned, see https://doi.org/10.6084/m9.figshare.22567318.v2.

      3) Line 1034 et seq: Eco1/ESCO1/ESCO2 interaction with PDS5. Interface scores need to be shown to determine that the models shown are indeed likely to occur. If these interactions have low model confidence, Fig. 36 and discussion around potential relevance to PDS5-Eco1 orientation relative to the SMC3 head remains highly speculative and could be expunged.

      See https://doi.org/10.6084/m9.figshare.22567318.v2. It should be clear that the predictions are very similar in fungi and animals. Crucially, we know that Pds5 is essential for acetylation in vivo, so the models appear plausible from a biological point of view.

      4) Considering the relatively large interface between ECO1 and SMC3, would the author consider the possibility that in addition to acetylating SMC3's ATPase domain, ECO1 remains bound to cohesin-DNA complex, as proposed for ESCO1 by Rahman et al (10.1073/pnas.1505323112)?

      This is certainly possible but we would not want to indulge in such speculation.

      5) E.g. Line 875 but also throughout the text: As there is no labeling of the N- and C-termini in the Figures, is frequently unclear what the authors are referring to when they mention that AF models orient chains in a certain manner.

      Good point. This has been amended. However, the positions of N- and C- is all available at https://doi.org/10.6084/m9.figshare.22567318.v2.

      6) Fig19B: PAE plots: authors should indicate which chains correspond to A, B, C. Which segment corresponds to the TYxxxR[T/S]L motif? Can they highlight this section on the PAE plot?

      Good point and amended in the revised manuscript.

      Minor comments:

      1) Line 440: the WAPL YSR motif is not shown in Fig. 14A

      2) Line 691: Scc3 spelling error.

      3) Line 931: Sentence ending '... SCC3 (SCC3N).' requires citation.

      4) Line 1008: Figure reference seems wrong. It should read: Fig. 34A left and right. Fig. 34B does not contain SCC1.

      Many thanks for spotting these. Hopefully, all corrected.

      5) Fig. 41 can be removed as it shows the absence of the interaction of Sororin with SMC1:SCC1. Sufficient to mention in the text that Sororin does not appear to interact with SMC1:SCC1.

      This is possible but we decided to leave this as is.

      Reviewer #2 (Recommendations For The Authors):

      Minor points

      (1) Are there any predicted models in which one of the two dimer interfaces of the hinge is open when the coiled coils are folded back, as seen in the cryo-EM structure of human cohesin-NIPBL complex in the clamped state?

      No AF runs ever predicted half opened hinges. It is possible that the introduction of mutations in one of the two interfaces might reveal a half-opened state and we ought to try this. However, it would not be appropriate for this manuscript, we believe.

      (2) Structures of the SA-Scc1 CES bound to [Y/F]xF motifs from Sgo1 and CTCF have been reported, suggesting that a similar motif could interact with SA/Scc3. Surprisingly, AF did not predict an interaction between Scc3/SA and Wapl FGF motifs, which only bind to the Pds5 WEST region. On the other hand, AF predicted interactions of the Sororin FGF motif with both Pds5 WEST and SA CES. Can the authors comment on this Wapl FGF binding specificity? What will happen if a Wapl fragment lacking the CTD is used in the prediction?

      This seems to be an academic point as the CTD is always present.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      1) The authors need to validate that RAP1-HA still retains its essential function. As indicated above, if RAP1-HA still retains its essential functions, cells carrying one RAP1-HA allele and one deleted allele are expected to grow the same as WT cells. These cells should also have the WT VSG expression pattern, and RAP1-HA should still interact with TRF.

      We demonstrated that C-terminally HA-tagged RAP1 co-localizes with telomeres by a combination of immunofluorescence and fluorescence in situ hybridization (Cestari and Stuart, 2015, PNAS), and co-immunoprecipitate telomeric and 70 bp repeats (Cestari et al. 2019 Mol Cell Biol). We also showed by immunoprecipitation and mass spectrometry that HA-tagged RAP1 interacts with nuclear and telomeric proteins, including PIP5Pase (Cestari et al. 2019). Others have also tagged T. brucei RAP1 with HA without disrupting its nuclear localization (Yang et al. 2009, Cell), all of which indicate that the HA-tag does not affect protein function. As for the suggested experiment, there is no guarantee that cells lacking one allele of RAP1 will behave as wildtype, i.e., normal growth and repression of VSGs genes. Also, less than 90% of T. brucei TRF was reported to interact with RAP1 (Yang et al. 2009, Cell), which might be indirect via their binding to telomeric repeats rather than direct protein-protein interactions.

      2) The authors need to remove the His6 tag from the recombinant RAP1 fragments before the EMSA analysis. This is essential to avoid any artifacts generated by the His6-tagged proteins.

      Our controls show that the His-tag is not interfering with RAP1-DNA binding. We show in Fig 3CG by EMSA and in Fig S5 by EMSA and microscale thermophoresis that His-tagged full-length rRAP1 does not bind to scrambled telomeric dsDNA sequences, which demonstrates that His-tagged rRAP1 does not bind unspecifically to DNA. Moreover, in Fig 3G and Fig S5, we show that His-tagged rRAP11-300 also does not bind to 70 bp or telomeric repeats. In contrast, the full-length His-tagged rRAP1, rRAP1301-560, or rRAP1561-855 bind to 70 bp or telomeric repeats (Fig 3C-G). Since all proteins were His-tagged, the His tag cannot be responsible for the DNA binding. We have worked with many different His-tagged proteins for nucleic acid binding and enzymatic assays without any interference from the tag (Cestari and Stuart, 2013; JBC; Cestari et al; 2013, Mol Cell Biol; Cestari and Stuart, 2015, PNAS; Cestari et al. 2016; Cell Chem Biol; Cestari et al. 2019 Mol Biol Cell).

      3) More details need to be provided for ChIPseq and RNAseq analysis regarding the read numbers per sample, mapping quality, etc.

      Table S3 includes information on sequencing throughput and read length. Mapping quality was included in the Methods section “Computational analysis of RNA-seq and ChIP-seq”, starting at line 499. In summary, we filtered reads to keep primary alignment (eliminate supplementary and secondary alignments). We also analyzed ChIP-seq with MAPQ ≥20 (99% probability of correct alignment) to distinguish RAP1 binding to specific ESs, including silent vs active ES (ChIP-seq). We included Fig S4 to show the effect of filtering alignments on the active vs silent ESs. We used MAPQ ≥30 to analyze RNA-seq mapping to VSG genes, including those in subtelomeric regions. Our scripts are available at https://github.com/cestari-lab/lab_scripts. We also included in the Methods, lines 522-524: “Scripts used for ChIP-seq, RNA-seq, and VSG-seq analysis are available at https://github.com/cestari-lab/lab_scripts. A specific pipeline was developed for clonal VSG-seq analysis, available at https://github.com/cestarilab/VSG-Bar-seq.”

      4) The authors should revise the Discussion section to clearly state the authors' speculations and their working models (the latter of which need solid supporting evidence). Specifically, statements in lines 218 - 219 and lines 224-226 need to be revised.

      The statement “likely due to RAP1 conformational changes” in line 228 discusses how binding of PI(3,4,5)3 could affect RAP1 Myb and MybL domains binding to DNA. We did not make a strong statement but discussed a possibility. We believe that it is beneficial to the reader to have the data discussed, and we do not feel this point is overly speculative. For lines 224-226 (now 234-235), the statement refers to the finding of RAP1 binding to centromeric regions by ChIP-seq, which is a new finding but not the focus of this work. To make it clear that it does not refer to telomeric ESs, we edited: “The finding of RAP1 binding to subtelomeric regions other than ESs, including centromeres, requires further validation.” Since RAP1 binding to centromeres is not the focus of the work, future studies are necessary to follow up, and we believe it is appropriate in the Discussion to be upfront and highlight this point to the readers.

      Our model is based on the data presented here but also on scientific literature. We have reviewed the Discussion to prevent broad speculations. When discussing a model, we stated (line 245): “The scenario suggests a model in which …”, to state that this is a working model. Similarly, in Results (line 201) we included: “Our data suggest a model in which…”.

      5) The authors should revise the title to reflect a more reasonable conclusion of the study.

      We agree that the title should be changed to imply a direct role of PI(3,4,5)P3 regulation of RAP1, which is not captured in the original title. This will provide more specific information to the readers, especially those broadly interested in telomeric gene regulation and RAP1. The new title is: PI(3,4,5)P3 allosteric regulation of repressor activator protein 1 controls antigenic variation in trypanosomes

      6) The authors are recommended to provide an estimation of the expression level of the V5-tagged PIP5pase from the tubulin array in reference to the endogenous protein level.

      The relative mRNA levels of the exclusive expression of PIP5Pase mutant compared to the wildtype is available in the Data S1, RNA-seq. The Mut PIP5Pase allele’s relative expression level is 0.85fold to the WT allele (both from tubulin loci). We also showed by Western blot the WT and Mut PIP5Pase protein expression (Cestari et al. 2019, Mol Cell Biol). Concerning PIP5Pase endogenous alleles, we compared normalized RNA-seq counts per million from the conditional null PIP5Pase cells exclusively expressing WT or the Mut PIP5Pase alleles (Data S1, this work) to our previous RNA-seq of single-marker 427 strain (Cestari et al. 2019, Mol Cell Biol). We used the single-maker 427 because the conditional null cells were generated in this strain background. The PIP5Pase WT and Mut mRNAs expressed from tubulin loci are 1.6 and 1.3-fold the endogenous PIP5Pase levels in single-marker 427, respectively. We included a statement in the Methods, lines 275-278: “The WT or Mut PIP5Pase mRNAs exclusively expressed from tubulin loci are 1.6 and 1.3-fold the WT PIP5Pase mRNA levels expressed from endogenous alleles in the single marker 427 strain. The fold-changes were calculated from RNA-seq counts per million from this work (WT and Mut PIP5Pase, Data S1) and our previous RNA-seq from single marker 427 strain (24).”

      7) The authors are recommended to provide more detailed EMSA conditions such as protein and substrate concentrations. Better quality EMSA gels are preferred.

      All concentrations were already provided in the Methods section. See line 356, in topic Electrophoretic mobility shift assays: “100 nM of annealed DNA were mixed with 1 μg of recombinant protein…”. For microscale thermophoresis, also see lines 375-376 in topic Microscale thermophoresis binding kinetics: “1 μM rRAP1 was diluted in 16 two-fold serial dilutions in 250 mM HEPES pH 7.4, 25 mM MgCl2, 500 mM NaCl, and 0.25% (v/v) N P-40 and incubated with 20 nM telomeric or 70 bp repeats…”. Note that two different biochemical approaches, EMSA and microscale thermophoresis, were used to assess rRAP1-His binding to DNA. Both show agreeable results (Fig 3 and 5, and Fig S5. Microscale thermophoresis shows the binding kinetics, data available in Table 1). The EMSA images clearly show the binding of RAP1 to 70 bp or telomeric repeats but not to scramble telomeric repeat DNA.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Figures

      All figures should have their axes properly labeled and units should be indicated. For many of the ChIPseq datasets it is not clear whether the authors show a fold enrichment or RPM and whether they used all reads or only uniquely mapping reads. Especially the latter is a very important piece of information when analyzing expression sites and should always be reported. The authors write, that all RNA-seq and ChIP-seq experiments were performed in triplicate. What is shown in the figures, one of the replicates? Or the average?

      ChIP-seq is shown as fold enrichment; we clarified this in the figures by including in the y-axis RAP1-HA ChIP/Input (log 2). We included in figure legends, see line 710: “Data show fold-change comparing ChIP vs Input.”. For quantitative graphs (Fig 2B, D, and E, and Fig 5F and G), data are shown as the mean of biological replicates. Graphs generated in the integrated genome viewer (IGV, qualitative graphs) is a representative data (Fig 2A, C, and F, and Fig 5D-E). All statistical analyses were calculated from the three biological replicates. Uniquely mapped reads were used. We also included ChIP-seq analysis with MAPQ ≥10 and 20 (90% and 99% probability of correct alignment, respectively) to distinguish RAP1 binding to ESs. Fig S4 shows the various mapping stringency and demonstrates the enrichment of RAP1-HA to silent vs active ES.

      Figure 1 is very important for the main argument of the manuscript, but very difficult (impossible for me) to fully understand. It would be great if the author could make an effort to clarify the figure and improve the labels. Panel Fig 1E. Here it is impossible to read the names of the genes that are activated and therefore it is impossible to verify the statements made about the activation of VSGs and the switching.

      We have edited Fig 1E to include the most abundant VSGs, which decreased the amount of information in the graph and increased the label font. We also re-labeled each VSG with chromosome or ES name and common VSG name when known (e.g., VSG2). We included Table S1 in the supplementary information with the data used to generate Fig 1E. In Table S1, the reader will be able to check the VSG gene IDs and evaluate the data in detail. We included in the legend, line 700: “See Table S1 for data and gene IDs of VSGs.”

      Figure 1F: This panel is important and should be shown in more detail as it distinguishes VSG switching from a general VSG de-repression phenotype. VSG-seq is performed in a clonal manner here after PIP5Pase KD and re-expression. To show that proper switching has occurred place in the different clones, instead of a persistent VSG de-repression, the expression level of more VSGs should be shown (e.g. as in panel E) to show that there is really only one VSG detected per clone. For example, it is not clear what the authors 'called' the dominant VSG gene.

      We showed in supplementary information Fig S1 B-C examples of reads mapping to the VSGs. Now we included a graph (Fig S1 D) that quantifies reads mapped to the VSG selected as expressed compared to other VSG genes considered not expressed). The data show an average of several clones analyzed. Other VSGs (not selected) are at the noise level (about 4 normalized counts) compared to >250 normalized counts to the selected as expressed VSGs.

      As mentioned in the public comments, I don't see how the data from Fig 1E and 1F fit together. Based on Fig 1E VSG2 is the dominant VSG, based on Fig 1F VSG2 is almost never the dominant VSG, but the VSG from BES 12.

      In Fig 1E, the VSG2 predominates in cells expressing WT PIP5Pase, however, in cells expressing Mut PIP5Pase, this is not the case anymore. Many other VSGs are detected, and other VSG mRNAs are more abundant than VSG2 (see color intensity in the heat map). The Mut cells may also have remaining VSG2 mRNAs (from before switching) rather than continuous VSG2 expression. This is the reason we performed the clonal analysis shown in Fig 1F, to be certain about the switching. While Fig 1F shows potential switchers in the population, Fig 1E confirms VSG switching in clones.

      Many potential switchers were detected in the VSG-seq (Fig 1F, the whole cell population is over 107 parasites), but not all potential switchers were detected in the clonal analysis because we analyzed 212 clones total, a fraction of the over 107 cells analyzed by VSG-seq (Fig 1E). Also, it is possible that not all potential switchers are viable. A preference for switching to specific ESs has been observed in T. brucei (Morrison et al. 2005, Int J Parasitol; Cestari and Stuart, 2015, PNAS), which may explain several clones switching to BES12.

      Note that in Fig 1F, tet + cells did not switch VSGs at all; all 118 clones expressed VSG2. We relabeled Fig 1F for clarity and included the VSG names. We added gene IDs in the Figure legends, see line 702 “ BES1_VSG2 (Tb427_000016000), BES12_VSG (Tb427_000008000)…”

      Statements in Introduction / Discussion

      The statement in lines 82/83 is very strong and gives the impression that the PIP5Pase-Rap1 circuit has been proven to regulate antigenic variation in the host. However, I don't think this is the case. The paper shows that the pathway can indeed turn expression sites on and off, but there is no evidence (yet) that this is what happens in the host and regulates antigenic variation during infection. The same goes for lines 214/215 in the discussion.

      We agree with the reviewer, and we edited these statements. The statement lines 82-83: “The data provide a molecular mechanism…” to “The data indicates a molecular mechanism…” For lines 224225: “and provides a mechanism to control…” to “and indicates a mechanism to control…”. We also included in lines 261-262: “It is unknown if a signaling system regulates antigenic variation in vivo.” Also edited lines 262-263: “…the data indicate that trypanosomes may have evolved a sophisticated mechanism to regulate antigenic variation...”.

      New vs old data

      In general, for Figures 1 - 4, it was a bit difficult to understand which panels showed new findings, and which panels confirmed previous findings (see below for specific examples). In the text and in the figure design, the new results should be clearly highlighted. Authors: All data presented is new, detailed below.

      Figure 1: A similar RNA-seq after PIP5Pase deletion was performed in citation 24. Perhaps the focus of this figure should be more on the (clone-specific) VSG-seq experiment after PIP5Pase re-introduction.

      This is the first time we show RNA-seq of T. brucei expressing catalytic inactive PIP5Pase, which establishes that the regulation of VSG expression and switching, and repression of subtelomeric regions, is dependent on PIP5Pase enzyme catalysis, i.e., PI(3,4,5)P3 dephosphorylation. Hence, the relevance and difference of the RNA-seq here vs the previous RNA-seq of PIP5Pase knockdown.

      Figure 2: A similar ChIP-seq of RAP1 was performed in citation 24, with and without PIP5Pase deletion. Could new findings be highlighted more clearly?

      Our and others’ previous work showed ChIP-qPCR, which analyses specific loci. Here we performed ChIP-seq, which shows genome-wide binding sites of RAP1, and new findings are shown here, including binding sites in the BES, MESs, and other genome loci such as centromeres. We also identified DNA sequence bias defining RAP1 binding sites (Fig 2A). We also show by ChIP-seq how RAP1-binding to these loci changes upon expression of catalytic inactive PIP5Pase. To improve clarity in the manuscript, we edited lines 129-130: “We showed that RAP1 binds telomeric or 70 bp repeats (24), but it is unknown if it binds to other ES sequences or genomic loci.”

      Figure 4: Binding of Rap1 to PI(3,4,5)P3, but not to other similar molecules, was previously shown in citation 24. Could new findings be highlighted more clearly?

      We published in reference 24 (Cestari et al. Mol Cell Biol) that RAP1-HA can bind agarose beadsconjugated synthetic PI(3,4,5)P3. Here, we were able to measure T. brucei endogenous PI(3,4,5)P3 associated with RAP1-HA (Fig 4F). Moreover, we showed that the endogenous RAP1-HA and PI(3,4,5)P3 binding is about 100-fold higher when PIP5Pase is catalytic inactive than WT PIP5Pase. The data establish that in vivo endogenous PI(3,4,5)P3 binds to RAP1-HA and how the binding changes in cells expressing mutant PIP5Pase; this data is new and relevant to our conclusions. To clarify, we edited the manuscript in lines 180-182: “To determine if RAP1 binds to PI(3,4,5)P3 in vivo, we in-situ HA-tagged RAP1 in cells that express the WT or Mut PIP5Pase and analyzed endogenous PI(3,4,5)P3 levels associated with immunoprecipitated RAP1-HA”.

      Sequencing.<br /> I really appreciate the amount of detail the authors provide in the methods section. The authors do an excellent job of describing how different experiments were performed. However, it would be important that the authors also provide the basic statistics on the sequencing data. How many sequencing reads were generated per run (each replicate of the ChIP-seq and RNA-seq assays)? How long were the reads? How many reads could be aligned?

      The sequencing metrics for RNA-seq and ChIP-seq for all biological replicates were included in Table S3 (supplementary information). The details of the analysis and sequencing quality were described in the Methods section “Computational analysis of RNA-seq and ChIP-seq”. To be clearer about the analysis, we also included in Methods, lines 522-524: “Scripts used for ChIP-seq, RNA-seq, and VSG-seq analysis are available at https://github.com/cestari-lab/lab_scripts. A specific pipeline was developed for clonal VSG-seq analysis, available at https://github.com/cestari-lab/VSG-Bar-seq.”.

      Minor comments:

      Figure 1B: I would recommend highlighting the non-ES VSGs and housekeeping genes with two more colors in the volcano plot, to show that it is mostly the antigen repertoire that is deregulated, and not the Pol ll transcribed housekeeping genes. This is not entirely clear from the panel as it is right now.

      The suggestion was incorporated in Fig 1B. We color-coded the figure to include BES VSGs, MES VSGs, ESAGs, subtelomeric genes, core genes (typically Pol II and Pol III transcribed genes), and Unitig genes, those genes not assembled in the 427-2018 reference genome.

      Were the reads in Figure 2a filtered in the same way as those in Figure 2C? To support the statements, only unique reads should be used.

      Yes, we also added Fig S4 to make more clear the comparison between read mapping to silent vs active ES.

      It would be good if the authors could add a supplementary figure showing the RAP1 ChIP-seq (WT and cells lacking a functional PIP5Pase) for all silent expression sites.

      We had RAP1 ChIP-seq from cells expressing WT PIP5Pase already. We have it modified to include data from the Mutant PIP5Pase. See Fig S3 and S5.

      In Figure 5D, after depletion of PIP5Pase, RAP1 binding appears to decrease across ESAGs, but ESAG expression appears to increase. How can this be explained with the model of RAP1 repressing transcription?

      We included in the Results, lines 208-212: “The increased level of VSG and ESAG mRNAs detected in cells expressing Mut PIP5Pase (Fig 5D) may reflect increased Pol I transcription. It is possible that the low levels of RAP1-HA at the 50 bp repeats affect Pol I accessibility to the BES promoter; alternatively, RAP1 association to telomeric or 70 bp repeats may affect chromatin compaction or folding impairing VSG and ESAG genes transcription.”.

      Reviewer #3 (Recommendations For The Authors):

      Line 114 - typo? Procyclic instead of procyclics:

      Fixed, thanks.

      Line 233 - the phrasing here is confusing, may want to replace "whose" with "which" (if I am interpreting correctly):

      Thanks, no changes were needed. I have had the sentence reviewed by a Ph.D.-level scientific writer.

      Methods - there is no description of VSG-seq analysis in the methods. Is it done the same way as the RNA-seq analysis? Is the code for analysis/generating figures available online?

      The procedure is similar. We included an explanation in Methods, lines 503-504: “RNA-seq and VSG-seq (including clonal VSG-seq) mapped reads were quantified…”. Also, in lines 522-54: “Scripts used for ChIP-seq, RNA-seq, and VSG-seq analysis are available at https://github.com/cestari-lab/lab_scripts. A specific pipeline was developed for clonal VSG-seq analysis, available at https://github.com/cestarilab/VSG-Bar-seq.”.

      Fig 1H - Is this from RNA-seq or VSG-seq analysis of procyclics?

      The procyclic forms VSG expression analysis was done by real-time PCR. To clarify it, we included it in the legend “Expression analysis of ES VSG genes after knockdown of PIP5Pase in procyclic forms by real-time PCR”. We also amended the Methods, under the topic RNA-seq and real-time PCR, line 402-407: “For procyclic forms, total RNAs were extracted from 5.0x108 T. brucei CN PIP5Pase growing in Tet + (0.5 µg/mL, no knockdown) or Tet – (knockdown) at 5h, 11h, 24h, 48h, and 72h using TRIzol (Thermo Fisher Scientific) according to manufacturer's instructions. The isolated mRNA samples were used to synthesize cDNA using ProtoScript II Reverse Transcriptase (New England Biolabs) according to the manufacturer's instructions. Real-time PCRs were performed using VSG primers as previously described (23).”

      Fig 2 A - Where it says "downstream VSG genes" I assume "downstream of VSG genes" is meant? the regions described in this figure might be more clearly laid out in the text or the legend

      Fixed, thanks. We included in the text in Results, line 140: “… and Ts and G/Ts rich sequences downstream of VSG genes”.

      Fig 2E - what does "Flanking VSGs" mean in this context?

      We added to line 705, figure legends: “Flanking VSGs, DNA sequences upstream or downstream of VSG genes in MESs. “

      Fig 2H - Why is the PIP5Pase Mutant excluded from the Chr_1 core visualization?

      We did not notice it. We included it now; thanks.

    1. Author Response

      We thank the reviewers for their rigorous and insightful comments, as well as their positive feedback on the manuscript. We agree with reviewer #1 that substantial additional work is needed for a complete mechanistic understanding of how NI circuitry works and we expect that the transgenic tools we generated will be valuable for such experiments. It is noteworthy that specific driver lines do not currently exist for IPN neurons, which limited our ability to perform optogenetic experiments activating the IPN to NI pathway. Reviewer 2 asks for additional clarification and analysis on various experiments, which we intend to address in a revised manuscript. We concur with reviewer #3 that, with the existing data, it is not possible to conclude with certainty that the IPN projections from gsc2 and rln3 NI neurons are solely axonal in nature. Additional experiments with axon- and dendrite- specific markers will be used to resolve this point in future work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study was designed to examine the bypass of Ras/Erk signaling defects that enable limited regeneration in a mouse model of hepatic regeneration. This hepatocyte proliferation is associated with the expression by groups of cells of mRNA-loaded CD133+ intracellular vesicles that mediate an intercellular signaling pathway that supports proliferation. These are new observations, supported by convincing data, that have broad significance to the fields of regeneration and cancer.

      First of all, we greatly appreciate the very positive take of this work by eLife editors and also thank the two reviewers for their constructive comments. We have provided point-by-point responses as follows.

      Reviewer #1 (Public Review):

      This study was designed to examine the bypass of Ras/Erk signaling defects that enable limited regeneration in a mouse model of hepatic regeneration. The authors show that this hepatocyte proliferation is marked by expression of CD133 by groups of cells. The CD133 appears to be located on intracellular vesicles associated with microtubules. These vesicles are loaded with mRNA. The authors conclude that the CD133 vesicles mediate an intercellular signaling pathway that supports cell proliferation. These are new observations that have broad significance to the fields of regeneration and cancer.

      The primary observation is that the limited regeneration observed in livers with Ras/Erk signaling defects is associated with CD133 expression by groups of cells. The functional significance of CD133 was tested using Prom1 KO mice - the data presented are convincing.

      The major weakness of the study is that some molecular mechanistic details are unclear - this is, in part, due to the extensive new biology that is described. Nevertheless, the data used to support some key points in this study are unclear:

      We fully agree that some details of the molecular mechanisms are yet to be elucidated for the CD133+ vesicles (intercellsomes, as we named). This is the first report of a new direct cell-cell communication mechanism provoked in stress response to proliferative signal deficit.

      Remarkably, many questions remain open for the molecular mechanisms for formation and functions of relatively well-characterized structures such as exosomes/EVs, despite a huge body of literature since their discoveries.

      a) What is the evidence that the observed CD133 groups of cells are not due to clonal growth. Is this conclusion based on the time course (the groups appear more rapidly than proliferation) or is this based on the GFP clonal analysis?

      This is indeed a very critical point for this study. Our initial thought and efforts were indeed on finding evidence that supports clonal expansion of progenitor cells. However, the experiments showed that the CD133+ cells were negative for all other stem/progenitor cell markers and that they are mature hepatocytes. CD133 expression was upregulated dramatically in regenerating livers and disappeared upon completion of liver regeneration. Furthermore, suppression of Ras-Erk signaling by Shp2 and Mek inhibitors robustly induced CD133 expression in a variety of cancer cell lines in culture in vitro.

      At 2 days after PHx, we already observed big colonies, which were unlikely derived from a single initiating cell (Figure 1). The GFP clonal analysis unambiguously demonstrated the heterogenous origin of the clustered cells (Figure 3). We detected mixed GFP-positive and -negative cells within each colony, without a single colony consisting entirely of GFP-positive cells. The original colony sizes were estimated to be 10 cells or more (Figures 3G and Figure 3–figure supplement 1B). Thus, both the sizes and compositions in the GFP clonal analyses support the assertion that CD133+ cell clusters originated from multiple mature hepatocytes.

      b) What is the evidence that the CD133 vesicles mediate intercellular communication. This is an exciting hypothesis, but what is the evidence that this happens? Is this inferred from IEG mRNA diversity? or some other data. Is there direct evidence of transfer - for example, the does the GFP clonal analysis show transfer of GFP that is not mediated by clonal proliferation? Moreover, since the hepatocytes are isogenic, what distinguishes the donor and recipient cells? Increased clarity concerning what is hypothesis and what is directly supported by data - would improve the presentation of this study.

      Per the reviewer’s advice, we have clarified these points in the revised version. Our proposal that CD133 vesicles mediate intercellular communication was supported by these experimental results.

      A). Data in Fig. 5 suggest direct trafficking of the vesicles, as CD133 existed on the filaments that bridge the tightly contacting cells. This was confirmed by two different CD133 antibodies in mouse and human. Of note, CD133+ vesicles are negative for CD9, CD63 or CD81, markers for exosomes/EVs. We could only isolate CD133+ vesicles from cell lysates in vitro and mouse tissue lysates, but not from cell supernatants from which exosomes/EVs are isolated.

      B). More direct evidence of the transfer was presented in Fig. 6H, showing Myc-tagged CD133 molecules transferred from one cell to another. In response to reviewers’ comments, we now conducted correlative light and electron microscopy to characterize the exchange event around the cell-cell border at EM level (new Figure6-figure supplement 2).

      C). Further experimental evidence was provided in the single and double gene KO experiments in Fig. 8E-G, suggesting the functional significance of CD133 in intercellular communication.

      D). In addition to the data above, the IEG mRNA diversity analyses based on scRNA-seq support the mRNA exchange model. The isogenic CD133+ SKO hepatocytes were found to lack different IEG transcripts randomly. This is why we propose a mutually sharing model, rather than a donor and recipient model. Importantly, the mRNA diversity (entropy) model also illustrates the association of CD133 and “stemness", as described in the discussion.

      In sum, we believe that a most reasonable interpretation of the current data set is a model of direct cell-cell communication via CD133+ vesicles. We take the reviewer’s point and have made changes to the text to better distinguish conclusion and hypothesis, which will be validated in future studies.

      Reviewer #2 (Public Review):

      The manuscript by Kaneko set out to understand the mechanisms underlying cell proliferation in hepatocytes lacking Shp2 signals. To do this, the authors focused on CD133 as the proliferating clusters of cells in the Shp2 knockout (SKO) livers are CD133 expressing. After excluding the contribution of progenitors that are CD133 to this cell population, the authors focused on the intrinsic regulation of CD133 by Met/Shp2 regulated Ras/Erk pathway and showed upregulation of CD133 to be a compensatory signal to overcome loss of Ras/Erk signal and suggested Wnt10a in the regulation of CD133 signal. The study then focused on the observed filament localization of CD133 in the CD133+ cluster of cells. The study went on to identify the CD133+ vesicles that contain primarily mRNA vs. microRNA like other EVs. Specifically, the authors identified several mRNA species that encode IEGs, indicating a potential role for these CD133+ vesicles in cell proliferation signal transmission to neighboring cells via delivery of the IEG mRNAs as cargos. Finally, they showed that the induction of CD133 (and by derivative, the CD133+ vesicles) are necessary for maintaining cell proliferation in the cell cluster with high proliferation capacities in the SKO livers; and in intestinal crypt organoids treated with Met inhibitors to block Ras/ERk signal.

      1) The identification of CD133+ vesicles is largely based on staining and costainings. Though the experiments are very well done with many controls and approaches, the authors may want to perform one or two key experiments with EM to definitively demonstrate the colocalization. For example, the mCherry experiment in Fig6H and the colocalization experiments for CD133 and HuR in Fig 7.

      Many thanks for the suggestion. We now completed the two suggested key experiments with new results added to the revised manuscript. For the mCherry experiment, we conducted correlative light and electron microscopy to characterize the exchange event between cells that stably express CD133-GFP fusion protein and mCherry+ cells (new Figure 6-figure supplement 2). The CD133-GFP was clearly found in the mCherry+ cells around the border, demonstrating the intercellular traffic. For the colocalization of CD133 and HuR, we performed double immunogold staining on the isolated vesicles, with the new results presented in the revised Figure7-figure supplement 1D.

      2) Since CD133+ marks the 50nM intracellsome defined by the authors, it is unclear what the CD133- vesicles used as controls are. Are they regular EVs that are larger in size? This needs better clarification as they are used as a control for many experiments such as Fig 7A.

      Per the advice, we added more explanation to the revised text. We used regular EVs as the control, since they are the well-studied intercellular communication vesicles. Since the EVs are highly heterogenous, we did not choose to select a specific subpopulation of EVs. We used the well-established polymer-based precipitation method to isolate the EV fraction from cell culture supernatant for RNA-seq analysis. We did detect the enrichment of micro-RNAs in the isolated EVs, consistent with reports in the literature. Strikingly, the CD133 vesicles isolated from cell lysates showed a completely distinct RNA profile, relative to the EVs.

    1. Author Response:

      We thank the reviewers for their constructive comments. Below we include a point by point response.

      Reviewer #1 (Public Review):

      [...] Elaborate on the Methodology: Provide an in-depth explanation of the two active learning batch selection methods, including algorithmic details, implementation considerations, and any specific assumptions made. This will enable readers to better comprehend and evaluate the proposed techniques.

      We thank the reviewer for this suggestion. Following this comments we will extend the text in Methods (in Section: Batch selection via determinant maxi- mization and Section: Approximation of the posterior distribution) and in Supporting Methods (Section: Toy example). We will also include the pseudo code for the Batch optimization method.

      Clarify Evaluation Metrics: Clearly specify the evaluation metrics employed in the study to measure the performance of the active learning methods. Additionally, conduct statistical tests to establish the significance of the improvements observed over existing batch selection methods.

      Following this comment we will add to Table 1 details about the way we computed the cutoff times for the different methods. We will also provide more details on the statistics we performed to determine the significance of these differences.

      Enhance Reproducibility: To facilitate the reproducibility of the study, consider sharing the code, data, and resources necessary for readers to replicate the experiments. This will allow researchers in the field to validate and build upon your work more effectively.

      This is something we already included with the original submission. The code is publicly available. In fact, we provide a phyton library, ALIEN (Active Learning in data Exploration) which is published on the Sanofi Github (https://github.com/Sanofi-Public/Alien). We also provide details on the public data used and expect to provide the internal data as well. We included a small paragraph on code and data availability.

      Reviewer #2 (Public Review):

      [...] I would expect to see a comparison regarding other regression metrics and considering the applicability domain of models which are two essential topics for the drug design modelers community.

      We want to thank the reviewer for these comments. We will provide a detailed response to their specific comments when we resubmit.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      We don't see the case for 1,5-IP8 as settled in plants, and none of the papers mentioned above draws this strong conclusion. This may be due to several limitations in the available data. The mentioned studies do not allow to differentiate the effects of 1-IP7 and 1,5-IP8 and, where binding or competition experiments have been performed, e.g. on the transcription factors, the differences in the Kd values for IP7 and IP8 were minor. Furthermore,1,5-IP8 levels and Pi starvation response do not always correlate. IPTK1 mutants, for example, show Pi overaccumulation, and low 5-IP7, but normal 1,5-IP8 (Riemer et al., 2021). Finally, plants are complex organisms with multiple tissue types that serve for accumulating, exporting, transporting or finally consuming Pi. Therefore, correlating inositol pyrophosphate levels from whole-plant extracts with a Pi starvation response is problematic, except if these data could both be obtained from the same cell types or at least tissues.

      The comment of the reviewer made us recognize that the complex situation in plants deserves a more detailed coverage and we have therefore adjusted the introduction accordingly.

      Results: "We determined the corresponding lysines in Pho81 (Fig. S3), created a point mutation in the genomic PHO81 locus that substitutes one of them, K154, by alanine, and investigated the impact on the PHO pathway."

      In my opinion, it would be important to test here in a quantitative in vitro binding assay if (i) the SPX domain of Pho81 can bind PP-InsPs including 1,5-InsP8, (ii) if the dissociation constant is in agreement with the cellular levels of 1,5-InsP8 in yeast (compare Fig. 2) and (iii) if the K154A mutation blocks or reduces the binding of 1,5-InsP8. Without such experimentation, I find the statement "this result underlines the efficiency of the K154A substitution in preventing PP-IP binding to the Pho81 SPX domain." to be overly speculative, as no binding experiment has been conducted.

      We agree with the comment of the reviewer concerning the overstatement in the phrase. It has been deleted.

      As mentioned already in our previous work (Wild et al., 2016), Pho81SPX counts among the SPX domains that we could not express recombinantly. Likewise, full-length Pho81, which would be the relevant object for correlating in vitro binding studies with the cellular concentrations, has not been accessible. Expression in yeast did not provide sufficient material for ITC or other quantitative techniques. Therefore, we refrained from pursuing binding studies. Nevertheless, given the high conservation of the positively charged patch on SPX domains and the fact that, in every case where it has been tested so far, SPX domains showed inositol polyphosphate binding activity, we find it a conservative assumption that the Pho81SPX binds them as well. This is supported by the effects of the binding site mutant, which mimics the effect of ablating IP8 synthesis.

      Results: "Inositol pyrophosphate binding to the SPX domain labilizes the Pho81-Pho80 interaction." Again, in the absence of any protein - protein interaction assay I find this statement not to be supported by the experiments outlined in the manuscript. The best way to address this point would be to perform either co-IP or in vitro pull-down experiments between Pho81-SPX and Pho81-85, in the pre- and absence of 1,5-InsP8 and/or using the Pho81 point-mutants described in the text.

      Since Pho81 could not be produced recombinantly, neither by us nor by others who worked on this protein previously, quantitative in vitro binding assays are not accessible for now. A simple IP suffers from the problem that Pho81 interacts with Pho85-Pho80 not only through the SPX domain but also through the minimum domain. The latter interaction may be constitutive. Since the main point of the manuscript is not to dissect the exact mechanisms of Pho85-Pho80 regulations, but only to address the point why the postulated inactivation of this kinase by an 1-IP7/minimum domain complex makes no sense, we prefer not to show a profound (and more complex) analysis of how the different Pho81 domains contribute to binding.

      To test the potential of the SPX domain for binding Pho85/Pho80 in vivo, we have created a GFP-fusion of the SPX domain of Pho81. This fusion protein localizes mainly to the cytosol when cells are on high-Pi. Upon Pi starvation, it concentrates in the nucleus. This concentration is not observed in pho80 mutant background (New Fig. S7).

      In line with this, I would suggest to move the molecular modelling/docking studies from the discussion into the results section and to use these models to design some interface mutations that could be tested in coIP and/or pull-down assays. Alternatively, the authors may choose to omit the discussion section starting with: "Even though the minimum domain is unlikely to function as a receptor for PP-IPs this does not ... and ending with . In sum, multiple lines of evidence support the view that the SPX domain exerts dominant, 1,5-IP8 mediated control over Pho81 activity in response to Pi availability."

      We have now moved the modelling data to the Results section. The structure prediction of the interface is experimentally validated. Data on the effect of interface substitutions are already published, although these substitutions had not been recognized as affecting a common interface at the time. Substituting the interface residues either on the side of Pho80 or of Pho81 constitutively activates Pho85-Pho80 kinase and destabilizes its interaction with Pho81. This was shown by Co-IP experiments from cell extracts by Huang et al. We mention the respective substitutions in the manuscript and cite the paper in which their effect on PHO pathway activation had been described.

      Reviewer #2 (Recommendations For The Authors):

      Some points need additional attention by the authors:

      • In general, it would be helpful to introduce abbreviations more thoroughly (certain enzyme names, PA, MD, ...)

      We paid more attention to this.

      • Also in general, the authors may want to think about the nomenclature of inositol pyrophosphates. Given the expansion of PP-IPs that are being detected in different organisms these days it may be a good time to convert to a more precise nomenclature, i.e. 5PP-IP5 instead of 5-IP7; and 1,5(PP)2-IP4, instead of 1,5-IP8. The latter could just be stated once, and then be abbreviated as IP8.

      To our understanding the field has not yet come up with a unified nomenclature. Therefore, we prefer to stick with the more practical nomenclature that we have chosen, which also corresponds to what is commonly used in presentations and discussions among colleagues. We have now introduced a sentence making the link to the nomenclature that the reviewer has proposed.

      • p. 1, Abstract: "negative bioenergetic impacts" - the phrasing seems really vague

      Agreed, but we find it difficult to be more explicit and precise in the abstract while remaining concise and not distracting from the main message. This aspect is better explained in the introduction.

      • p. 3, Significance statement: "... unified model across all eukaryotic kingdoms" While the intended meaning of this wording is better explained in the text later, the phrasing here suggests a more all-encompassing study at hand, instead of a conclusion that fits more closely with established reports from other organisms. Please rephrase.

      We have adapted the phrase to avoid this impression.

      • p. 4: "IPTKs" - are the ITPKs meant here?

      Yes, that was a typo.

      • p. 7, the introduction ends abruptly and could use a concluding sentence.

      Done

      • p.7, "enzymes diphosphorylation either the..."; I understand what the authors are trying to say with diphosphorylating, but the enzymes are phosphorylating a phosphorylated substrate.

      Yes. We changed the phrase to "....adding phosphate groups at the 1- or 5-positions....".

      • p. 7, subtitle "...concentrations and kinetics of..."; kinetics of what? Synthesis/turnover?

      We corrected this subtitle

      • p. 8, with regards to the recovery experiment: Was this recovery determined elsewhere (please cite)? Otherwise it would be beneficial to include an extra figure to illustrate these recoveries in the supplementary information. And do the authors suspect some hydrolysis of IP8 given the lower recovery?

      We have now added the experiment testing recovery of IPPs as the new Fig. S1.

      • p. 9: It is appreciated that the authors point out the concentration of IP6 in S. cerevisiae. I found that concentration rather low, and the authors could highlight this a bit more, given their ability to carry our absolute quantification.

      This was a leftover from a previous version of the paper. Since the paper does not treat IP6 or lower inositol polyphosphates, we have deleted this phrase.

      • p. 9, Fig 2: The exponential decay of 5-IP7 is very nicely shown in Figure 2c. But one of the most important discussion points is IP8 being the key controller of the PHO pathway - it would therefore be beneficial for the argument to also show the same kind of graph for IP8 and if possible, fit a function to the data points to better quantify and compare the decay processes (e.g. via "half-life time" of PP-IPs during starvation, in addition to the suggested "critical concentration" which was only discussed for 5-IP7 thus far).

      Kinetic resolution is an issue here. The approach shown in Figs. 2 and 5 is not apt to determine a critical concentration of IP8 because the decline upon transfer to starvation conditions is too fast and difficult to relate to the equally rapid induction of the PHO pathway. We shall address this point in a more appropriate setup in a future study.

      • p.9, Fig 2a: Where does the 5-IP7 come from in the kcs1Δ strain? In the text the authors state that 5-IP7 in kcs1Δ was not detected, but the figure suggests otherwise. Please explain.

      Currently, we do not know where these residual signals stem from. One possibility is that they represent other isomers that exist in minor concentrations and that are not resolved from 5-IP7 in CE. We added a sentence to the figure legend to indicate this.

      • p. 10: "IP8 was undetectable in kcs1Δ and decreased by 75% in vip1Δ. kcs1Δ mutants also showed a 2 to 3-fold decrease in 1-IP7, suggesting that the synthesisof 1-IP7 depends on 5-IP7. This might be explained by assuming that a significant source of 1-IP7 is synthesis of 1,5-IP8 through successive action of Kcs1 and Vip1, followed by dephosphorylation to 1-IP7." - Please specify this statement. Do the authors mean that 1,5-IP8 is only produced transiently below the detection capabilities of the method but that there still is a (reduced) flux from 5-IP7 to 1,5-IP8 to 1-IP7? Otherwise it would seem paradoxical to have a dependency on a non-existing metabolite in that cell line.

      This was not clearly expressed. The revised version now says: " ... a 2 to 3-fold decrease in 1-IP7, suggesting that the synthesis of 1-IP7 depends on 5-IP7. This might be explained by assuming that, in the wildtype, most 1-IP7 stems from the conversion of 5-IP7 to 1,5-IP8, followed by dephosphorylation of 1,5-IP8 to 1-IP7.". We hope that this clarifies the matter.

      • p. 10: "pulse-labeling approaches are not available for PP-IPs." While this statement is correct, a recent paper co-authored by Qui and Jessen showed nice pulse-labeling data for the lower Ips and could be cited here (PMID: 36589890)

      Yes, indeed, we should have been more precise here. What we wanted to express was that rapid pulse-labeling methods for following phosphate group turnover were lacking, with a temporal resolution of minutes rather than hours. Existing pulse labeling approaches, including the study mentioned by the reviewer, do not provide that. We have changed the phrase accordingly.

      • p. 10: continuation of caption of Fig 2: "were extracted [and] analyzed"

      Corrected. Thank you.

      • p. 12: How is 1-IP7 made in the vip1 kcs1 double mutant?

      As explained above, we suspect that these may be side products of IPMKs, which accumulate in the absence of vip1 phosphatase.

      • p. 13, caption to Figure 3: "XXX cells were analyzed" please replace the place holder XXX.

      Done. Thank you.

      • p. 13, Fig 3B, C, D and p. 50, Fig. S4: On screen the contrast between the different shades of grey of the bars are just visible enough, but not on paper, I suggest using a higher contrast/ different colouring scheme.

      We enhanced the contrast.

      • p. 24, 25, Fig 7.: I could not really appreciate the AlphaFold part, and found it unnecessary. No docking or molecular dynamics simulations were carried out here, and it was not clear to me what information should be gleaned from this part.

      Following this comment, we have modified the respective part of the text. This part refers to a publication from the O'Shea lab (Nat. Chem Biol. 4,25) proposing the model that 1-IP7 and the Pho81 minimum domain bind competitively to the active site of Pho85 to inhibit its kinase activity. Modeling of complexes between Pho81, Pho80 and Pho85, which we present in the manuscript, rather suggests binding of the minimum domain to a groove in Pho80. This is important because it provides a viable alternative model for the action of the minimum domain. It suggests the minimum domain as a constitutive linker that attaches Pho80 to Pho85. Importantly, this model accounts perfectly for the results of previous random mutagenesis studies on Pho80 and on the minimum domain, which had independently identified both the Pho80 groove and the minimum domain residues that bind it in the prediction as critical residues for inhibition of Pho85, and for integrity of the Pho85/Pho80/Pho81 complex. We find this alternative explanation for Pho85-Pho80 regulation by Pho81, which we can derive by combining the predictions with already published experimental data, an important element to re-evaluate the relevance of 1-IP7 in PHO pathway regulation and resolve one of the existing discrepancies.

      • p. 28: No experiments were carried out with plants or mammals. The relevance for plants or mammalian systems therefore seems to be overstated at this point in time.

      We are not quite sure how to interpret this remark. We do not claim that our data support a role for IP8 in mammals and plants. But we refer to and cite studies providing the strongest evidence in favor of it in these systems. The relevance of our current study relies in refuting seemingly strong evidence from yeast, which had been diametrically opposed to the data obtained in plants and mammals. The revision of the situation in yeast now paves the way to drawing a coherent concept for fungi, plants and mammals. We feel that this is important and should be underlined.

      • p. 31: "300 mL of 3% ammonium" - 300 µL?

      Yes. Thank you.

      • p. 45, CE-ESI-MS parameters: "1IP8"

      Corrected.

      • p. 47: Figure S1: Please include more experimental details in the caption and/or methods section. Was a similar analysis software used as e.g. Figure S2 (NIS Elements Software)? Please also include all the analysis software in the Methods section under "fluorescence microscopy". Unless these additional experimental details already clarify the following point: Can the authors briefly comment on why the morphological determination in S1 requires trypan blue staining while in later experiments the yeast cells are readily recognized by the software in "simple" brightfield images?

      Trypan blue staining is not strictly required for this. It is just a simple method to fluorescently stain the cell wall. There are many other ways of delineating the cells. It could also have been done in a brightfield image.

      We updated the figure legend to better describe how these measurements were done and deposited the script and training file on figshare.

      • p. 48: "can be downloaded from **" please insert the link once the script is available online.

      It has been deposited at Figshare under DOI 10.6084/m9.figshare.c.6700281

      Reviewer #3 (Recommendations For The Authors):

      1) Italicize the scientific names of the organisms; this was inconsistent throughout the manuscript. Also, gene names should be italicized; this was also inconsistent (e.g., p.12 "... did not induce the PHO84 and PHO5 [sic] promoters...).

      Done

      2) Summary of the Figure 2A data in the text (p.9) probably has swapped the determined concentrations for 1-IP7 and IP8 (0.3 µM or 0.5 µM) as compared with the data figure.

      Yes, indeed. We have corrected this.

      3) Figure 2A: which of the mutant PP-IP levels are significantly different from the WT control?

      We have now added asterisks to indicate the significance for every mutant.

      4) In the discussion on the data (Fig. 2A), I was tripped up by the verb tense in this phrase "5-IP7 has not been detected in the kcs1Δ mutant and 1-IP7 has been strongly reduced..."; I think you want to use the past tense "was" in both cases [as is used in the next sentence]. It made me wonder if there was a difference in the detection of 5-IP7 and IP8 in the kcs1Δ mutant, you could detect 5-IP7 but not IP8; if so, where did the 5-IP7 come from?

      We have corrected the tense. Thank you for highlighting this. For the residual inositol pyrophosphate signal in kcs1Δ. We do not know its origin. One possibility, which we now mention in the text, is that it stems from IPMK side activity. It should be underlined that all signals disappear upon PI starvation.

      Figure 2C, include the data points that the lines are built from (suggestion).

      We refrained from that for the line graphs. For reasons of consistency, we should do this for every line graph. If we did that, Fig. 4B would become quite hard to read.

      6) Figure 3B-D, please check that the stipples or hatches are in the figure - the printed copy lacked them although I could see them in the electronic version; this was also true for Figures 5 and 6 (I do not know if it is a printer issue, but other hatches were visible: e.g., not seen in S4 but seen in S5).

      They are visible in our copies, also after printing. They may have been lost during file conversion at the journal.

      7) The text description of the Pho4-yEGFP, Pho5-yEGFP and Pho84-yEGFP says that the kcs1Δ mutant "showed Pho4-yEGFP constitutively in the nucleus already ... and PHO5 and PHO84 were activated". However, the data is more complex than that: whereas the localization of Pho4-yEGFP is constitutively nuclear, there is a higher basal (repressed) expression of both Pho5 and Pho84 as well as increased expression of both proteins under -Pi conditions. What accounts for the increased expression when Pho4 is already nuclear? This is also seen in the vip1Δ kcs1Δ mutant.

      We agree with the reviewer, but we cannot explain this effect with certainty. One possibility could be a wider dysregulation of Pi metabolism in kcs1 mutants. To name a few possibilities: Wildtype cells have polyphosphate reserves that are gradually mobilized during the first hours of P-starvation. kcs1 mutants don't have those and might fall into a "deeper" state of starvation faster. It should be kept in mind that the starvation response is also regulated at the level of chromatin structure, and by antisense transcripts. The influence of kcs1 on these processes is unclear.

      8) Figure 9 legend: please add a definition of the MP region (in red) and include it more explicitly in the described model.

      We now mention the relevant region also in the legend and have labeled the relevant regions in the images (Huang et al., 2001).

      9) Figure S2 legend: information is missing (downloading link).

      It has been deposited at Figshare under DOI 10.6084/m9.figshare.c.6700281

      10) Figure S4 and S5, missing statistics.

      They have been added to the new Fig. S6, which interprets differences between strains and conditions. Fig. S4 (now S3) shows timecourses of IPPs down to zero. Adding statistics for all pairwise differences between the timepoints would be almost an overkill.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      It is very important to find practical and efficient means in order to increase agricultural productivity. Drawing on data from variable field environments, this study provides a useful theoretical framework to identify new factors that could increase agricultural production. There is solid evidence to support the authors' claims, though following the fate of candidate species after introduction into rice fields would have strengthened the study. Plant biologists and ecologists working in nature and fields will find the work interesting.

      Thank you so much for your careful evaluation of our manuscript. We are very pleased to hear that you found our framework useful. We have revised our manuscript according to the "Recommendations for the Authors" to improve our manuscript.

      Public Review

      Reviewer #1 (Public Review):

      This manuscript describes the identification of influential organisms on rice growth and an attempt of validation. The analysis of eDNA on rice pot and mimic field provides rice growth promoting organisms. This approach is novel for plant ecology field. However current results did not fully support whether eDNA analysis-based detection of influencing organism.

      Thank you so much for evaluating our manuscript. We have carefully read and responded to your comments. We hope our responses resolve your concerns on our study.

      The strength of this manuscript is to attempt application of eDNA analysis-based plant growth differentiation. The weakness is too preliminary data and experimental set-up to make any conclusion. The trials of authors experiments are ideal. However, the process of data analysis did not meet certain levels. For example, eDNA analysis of different time points on rice growth stages resulted in two influential organisms for rice growth. Then they cultivate two species and applied rice seedlings. Without understanding of fitness and robustness, how we can know the effect of the two species on rice growth.

      We agree with your comments that we did not have the fitness data of the two species and/or rice seedlings. Thus, it is still difficult to obtain deep understanding of the mechanisms of our findings that the species introduced in the system would influence rice growth. Nonetheless, our study demonstrated the effectiveness of our research framework as we found evidence that the species that were discovered by the eDNA monitoring and time series analysis indeed cause changes in the system. We believe that the first step is to show that the framework is workable and that detailed understanding of the mechanisms or genetic pathway was not a focus of our study. To avoid misunderstanding, we have added several explanations regarding this point in L426–431 and L447. For example, in L426, we have added the following statement: "... the detailed dynamics of the two introduced species was unclear (i.e., the fate of the introduced species). This is particularly important for understanding how the introduced organisms affected rice performance...".

      The authors did not check the fate of two species after introducing into rice. If this is true, it is difficult to link between the rice gene expression after treatments and the effectiveness of two species. I think the validation experiment in 2019 needs to be re-conducted.

      We did not check the fate of the two species (except measuring the eDNA concentrations of the species), and it is true that we cannot show evidence of "how" these two species influence the rice gene expression. Understanding molecular mechanisms of the phenomenon that we found is important (especially from the viewpoint of molecular biology), but our primary objective was to demonstrate that our "eDNA x time series analysis" framework is feasible for detecting previously overlooked but influential organisms. To this end, we believe that we achieved our objective and repeating the validation experiment should be for a different purpose (i.e., for understanding molecular mechanisms). We have clarified these points in L426–431 and L447 as explained above.

      Reviewer #2 (Public Review):

      The manuscript "Detecting and validating influential organisms for rice growth: An ecological network approach" explores the influence of biotic and abiotic entities that are often neglected on rice growth. The study has a straightforward experimental design, and well thought hypothesis for explorations. Monitoring data is collected to infer relationships between species and the environment empirically. It is analyzed with an up-to-date statistical method. This allowed the manuscript to hypothesize and test the effects most influential entities in a controlled experiment.

      Thank you so much for your careful evaluations. We are pleased to see that you evaluated our manuscript positively. We have further revised our manuscript according to your comments and hope the revision has resolved your concerns.

      The manuscript is interesting and sets up a nice framework for future studies. In general, the manuscript can be improved significantly, when this workflow is smoothly connected and communicated how they follow each other more than the sequence and dates provided. It is valuable philosophical thinking, and the research community can benefit from this framework.

      Thank you for your suggestions. In order to improve the logic flow and readability of our manuscript, we have revised the descriptions of workflow and clarified how the experimental and statistical steps were connected to each other. To do so, we have added brief explanations about what/how we did at the first sentence of Results subsections (some of these explanations were only in Materials and Methods in the original manuscript). Also, we have moved all of the Supplementary Materials and Methods to the main text. We have thoroughly revised the manuscript, and we hope that all the parts of our manuscript have been connected more smoothly than in the original manuscript.

      I understand the length and format of the manuscript make it difficult to add more details, but I am sure it can refer to/clear some concepts/methods that might be new for the audience. How/why variables are selected as important parts of the system, a tiny bit of information about the nonlinear time series analysis in the early manuscript, and the biological reasoning behind these statistically driven decisions are some examples.

      We have explained how/why variables are selected (in L125), added more information about the nonlinear time series analysis (in L129 and L175) , and added the biological reasoning behind the statistical decisions (L195).

      Reviewer #3 (Public Review):

      Most farming is done by subtracting or adding what people want based in nature. However, in nature, crops interact with various objects, and mostly we are unaware of their effects. In order to increase agricultural productivity, finding useful objects is very important. However, in an uncontrolled environment, it coexists with so many biological objects that it is very inefficient to verify them all experimentally. It is therefore necessary to develop an effective screening method to identify external environmental factors that can increase crop productivity. This study identified factors presumed to be important to crop growth based on metabarcoding analysis, field sampling, and non-linear analysis/information theory, and conducted a mesocosm experiment to verify them experimentally. In conclusion, the object proposed by the author did not increase rice yield, but rather rice growth rate.

      Thank you so much for your evaluation of our manuscript. We have revised our manuscript based on your comments, and hope it has been improved compared with the original version.

      Strength

      In actual field data, since many variables are involved in a specific phenomenon, it is necessary to effectively eliminate false positives. Based on the metabarcoding technique, various variables that may affect rice growth were quantitatively measured, although not perfectly, and the causal relationship between these variables and rice growth was analyzed by using information transfer analysis. Using this method, two new players capable of manipulating rice growth were verified, despite their unknown functions until now. I found this process to be very logical, and I think it will be valuable in subsequent ecological studies.

      We are very pleased to see that you found our framework is very logical and potentially beneficial for future ecological studies.

      Weaknesses

      CK treatment's effectiveness remains questionable. Rice's growth was clearly altered by CK treatment. The validation of the CK treatment itself is not clear compared to the GN treatment, and the transcriptome data analysis results do not show that DEG is not present. The possibility of a side effect caused by a variable that the author cannot control remains a possibility in this case. Even though this part is mentioned in Discussion, it is necessary to discuss various possibilities in more detail.

      We agree that the effectiveness of the CK treatment was questionable. We have added some more discussion about this point in L376: "The unclear effects of the CK treatment relative to those of the GN treatment could be due to the relatively unstable removal method (i.e., C. kiiensis larvae were manually removed by a hand net) or incomplete removal of the larvae (some larvae might have remained after the removal treatment)."

      Reviewer #1 (Recommendations For The Authors):

      Comment #1-1 This manuscript describes identification of influential organisms on rice growth and an attempt of validation. The analysis of eDNA on rice pot and mimic field provides rice growth promoting organisms. This approach is novel for plant ecology field. However current results did not fully support whether eDNA analysis-based detection of influencing organism.

      Thank you for your careful evaluations of our manuscript. We are pleased to see you found that our approach is novel. We have revised our manuscript in accordance with your comments, and we hope that the revision and responses resolved your concerns.

      Comment #1-2 1. Experimental setting: Authors made up small scale pot system in 2017 and then expanded manipulative experiment. I do not understand how two influencing organism sequences were identified from the single treatment depending on different time points. How they can be convince the two organisms affect the rice growth rather than other biological and environmental factors.

      In 2017, we performed an intensive monitoring of the experimental rice plots and obtained large time series data (122-day consecutive monitoring x 5 plots = 610 data points). The time series data were analyzed using the information-theoretic causal analysis. The analysis is critically different from correlational analyses and designed to identify causal relationships among variables. Although we understand that field manipulation experiments are a common and straightforward approach to identify causal relationships among organisms, we chose the "fieldmonitoring + time-series-based causal analysis" approach. This is because, as explained in the main text, there are numerous factors that could influence rice performance, and it is practically impossible to perform manipulative experiments for all the potential factors that could influence rice growth. On the other hand, our "field-monitoring + timeseries-based causal analysis" approach has a potential to identify multiple factors under field conditions, even by the single experimental treatment.

      Nonetheless, we must admit that our time-series-based approach still has a chance to misidentify causal factors. Our framework relies on statistics, so the chance of false-positive detection of causality cannot be zero. This was exactly the reason why we performed the "validation" experiment in 2019. To complement the statistical results of the 2017 experiments, we performed another experiment in 2019.

      Comment #1-3 2. eDNA technology: The eDNA analysis based on four universal primers 16s rRNA, 18s rRNA, ITS, and COI regions must not be enough to identify specific species. The resolution of species classification may not meet to confirm exact species. Thus, the accuracy of two species that they selected for further experiment is difficult to be confirmed. Authors also referred to "putative Globisporangium".

      Your point is correct. The DNA barcoding regions we selected are short and it is often difficult to identify species. However, this limitation could not have been overcome even if we had chosen a different genetic marker. The long-read sequencing technology could partially solve the issue, but the number of sequence reads generated by the long-read technique is less than that by the short-read sequencing technology, and comprehensive detection of all species in an ecological community was still challenging. Our approach struck a balance among the identification resolution, comprehensiveness of the analysis, and sequencing costs. In addition, even though we could not identify most ASVs at the species level, some ASVs could be identified at the species level (52 ASVs among the 718 ASVs which had causal influences on rice growth), and we selected the two species (G. nunn and C. kiiensis) from the 52 species.

      Further, the taxa assign algorithm we used here (i.e., Claident; Tanabe & Toju 2012 PLoS ONE 10.1371/journal.pone.0076910) adopted conservative criteria for species identification and has a low falsepositive probability.

      More importantly, this is also the reason why we performed the "validation" experiment in 2019. The species identified in the 2017 experiment are still "potential" organisms that influence rice growth (i.e., the hypothesis-generating phase), and we tested the hypothesis in 2019.

      Nonetheless, we must admit that clear description of potential limitations is important. Thus, we have discussed this in L418: "As for the second issue, short-read sequencing has dominated current eDNA studies, but it is often not sufficient for lower-level taxonomic identification. Using long-read sequencing techniques (e.g., Oxford Nanopore MinION) for eDNA studies is a promising approach to overcome the second issue".

      Comment #1-4 3. Biological relevance 1: Authors identify two organisms as influencing organism for rice growth. As conducting the first experiment in 2017, the 2019 experiment was different from natural condition. The two experiments in 2017 and 2019 were conducted under different conditions. How do they compare the experiments? At least, the eDNA analyses in 2017 and 2019 should be very similar. I cannot find such data.

      The experimental conditions were different between 2017 and 2019 because they were conducted in different years. Theoretically, it is ideal if the experimental conditions in 2019 are covered by the range of experimental conditions in 2017 (e.g,. rice variety, air temperature, rainfall, and solar radiation). If this condition were satisfied, the attractor (i.e., rice growth trajectory delineated in the state space) in 2019 would be within that in 2017, and our model prediction in 2017 would be used to predict dynamics in 2019 accurately. To fulfill the conditions, we made as much effort as possible: we used the same rice variety and soils in 2019 as those used in 2017, and started our experiment at the same timing in 2019 as that in 2017.

      Although natural ecological dynamics cannot be precisely controlled, our monitoring revealed that the ecological dynamics in 2019 was qualitatively similar to that in 2017. To demonstrate that the experimental conditions and eDNA community data were similar between the two experiments, we have presented the climate and eDNA data in an inset figure in Figure 3a, Figure 1–figure supplement 2, Figure 3–figure supplement 2. We must admit that these dynamics are not identical, but we hope that this resolves your concern.

      Comment #1-5 4. Lack of detail description: In the Materials and Methods, there are many parts which lack on detail description. For instance, authors must described the two species cultivation, application concentrations, and application methods.

      We have moved Supplementary Materials and Methods to the main text and added more detailed descriptions in Materials and Methods. Also, to improve the logical flow and readability of our manuscript, we have added brief explanations about what/how we did at the first sentence of Results subsections (some of these explanations were only in Materials and Methods in the original manuscript). We have added the reference for how to cultivate G. nunn in L608 (Kobayashi et al., 2010; Tojo et al., 1993) (C. kiiensis was not cultivated but removed from the system as in Materials and Methods), and application concentrations. Application methods were described in Materials and Methods, the section Field manipulation experiments in 2019 in L596.

      Comment #1-6 5. Validation: Application of one species clearly resulted to promote rice growth. They must include appropriate control treatment. If they pick same genus but different species that identified no specific effect on rice growth through eDNA analysis, no effect on growth can be provided. Generally application of large population of certain non-harmful organism confer plant growth promotion. It is not surprising result. Authors need to prove effectiveness of eDNA analysis. In addition, the field experiments required at least two years of consistent data for publication because environmental factors are so dynamic.

      Thank you for pointing this out. We agree with your comment that species that were predicted to have no effect should not promote rice growth in a validation experiment. It was also one of our inititial experimental plans to include such species in our manipulation experiment in 2019, but we could not include them because of the limitation of time, labor, and money. More extensive validation of the statistical results of the 2017 data, including multi-year experiments, would further validate the effectiveness of our approach, which should be done as future studies. To clarify this point, we have added statements in the paragraph starting at L396.

      Comment #1-7 In conclusion, I suggest that authors need more large data analysis and validate with more accurate and meaningful protocol.

      As we explained in the revised manuscript and the Response to Comments #1-2 to #1-7, our study demonstrated a novel research framework to detect previously overlooked influential organisms under field conditions. We agree that larger data analysis would be ideal to further validate our approach, but whether and how to collect larger data is constrained by time, money, and labor. We believe that our study was designed carefully and could provide meaningful avenues for developing an ecological-network based, novel, and environment-friendly agriculture solutions.

      Reviewer #2 (Recommendations For The Authors):

      Comment #2-1 Lines 97-110: This is so cool. Modeling with empirical data is very powerful. But a rice field is an open system consisting of metacommunity dynamics. Maybe a tiny bit of biological and biogeochemical background here would be good.

      Thank you for your comments. We have added a few examples of how and in which systems these methods were used to evaluate community dynamics and detect biological interactions in L109-L118.

      Comment #2-2 Lines 111-126: I like the summary of the study here. I think the influential species concept can be a little more elevated. Paine's famous keystone species work has been cited but a couple more pieces of literature can help to enhance the ecological importance of this work.

      We have explained the work by Paine (1966) a bit more and added one more paper that showed the effect of multiple predator species on the system dynamics at L88. We have also added a relevant sentence at L137 to emphasize the ecological/agricultural significance of our work.

      Comment #2-3 Experimental design/Figure 1:

      Is there any rationale behind choosing red individuals to measure the growth?

      Is there any competition between the individuals in the pots?

      Figure 1e: It is nice to show the ASVs in time. I wonder how the plot would look like when normalized by biomass/DNA content/coverage/rarefaction because of the seasonality.

      As for the first question, we chose the four individuals to minimize the edge effects (i.e., effects of microclimates and neighboring rice would be different between the four rice individuals and those planted in the edge regions). We have mentioned this in the legend of Figure 1.

      As for the second question, there might be competition among the individuals in the pot. However, we did not measure the effect of competition (e.g., by comparing the growth with/without other rice individuals).

      As for the third question, we published detailed dynamics of ecological community in the Supplementary Figures in Ushio (2022) Proceedings B https://doi.org/10.6084/m9.figshare.c.5842766.v1. In addition, we have uploaded a video showing the temporal dynamics of some top (= most abundant) ASVs in https://doi.org/10.6084/m9.figshare.23514150.v2.

      We have mentioned the supporting information in L153.

      Comment #2-4 Line 146-147: Is this damage influence the inferences? Maybe it is better to justify.

      While we occasionally observed physical damages, it is unlikely that they affected our causal inference because the changes in the rice heights due to the damages were smaller and less frequent than those due to growth. We have noted this at L151.

      Comment #2-5 Line 161-162: Maybe refer readers to the methods section where you explain UIC analysis. It'd be easier to interpret the figures.

      Mentioned.

      Comment #2-6 Line 175-176: I believe very brief information in the intro about the organisms might help explain the hypothesis and interpret the results better.

      We have included brief information of the two species at L197.

      Comment #2-7 Figure 2: Species interaction strength: Are these proxies to the Jacobians? Is there a threshold for the influence we can consider strong/weak? For example, influential species compared to diagonal elements of the Jacobians (intraspecies interactions) could be shown as a mean vertical line in Figure 2b.

      "Influences to rice growth" in Figure 2b is transfer entropy (TE) from a target ASV to rice growth. They are not proxies of the Jacobians, but they might positively correlate with the absolute value of the Jacobians. We have clarified this point in the legend (L953). More direct estimations of the Jacobian can be done using the MDR S-map method (Chang et al. 2021 DOI:10.1111/ele.13897), but we did not perform the MDR S-map in the present manuscript (see Ushio et al. 2023 https://doi.org/10.7554/eLife.85795 for the application of the MDR S-map). As for TE, there is no clear threshold to distinguish strong/weak interactions.

      Comment #2-8 Figure 2: Looking at panels c and d, it looks like there is a negative frequency selection between two influential species. Is it a reasonable observation?

      This is an interesting point. In this manuscript, we have not carefully examined the interspecific relationship between these two particular species. However, the interspecific interactions were examined in detail and reported in Ushio (2022) Proceedings of the Royal Society B DOI:10.1098/rspb.2021.2690). We re-checked the result in Ushio (2022); although there is a negative correlation between them, we did not find any (statistical) causal relationship between them.

      Comment #2-9 Line 209: What is t-SNE analysis? Because of the manuscript's format, maybe methods should be shortly referred to in the relevant section or explained in brackets.

      We have spelled out t-SNE.

      Comment #2-10 Line 212-214: Maybe briefly explain what the hypotheses are for the alternative analysis, and what is the contribution of the results to the study.

      We have added a brief explanation at L241: "Alternative statistical modeling that included the treatments (the control versus GN or CK treatments) and manipulation timing (i.e., before or after the manipulation), which simultaneously took the temporal changes of all the treatments into account, also showed qualitatively similar results (Supplementary file 4), further supporting the results."

      Comment #2-11 Figure 3b/c: Maybe species names as panel titles could be helpful. d: Treatment names with initials in the legend could be also helpful to read the plots.

      We have added species name as panel titles of Figure 3b,c. Treatment names were included in the legend of Figure 3.

      Comment #2-12 Line 233: Maybe mention why the manuscript uses the word "clear".

      We have mentioned this in L185.

      Comment #2-13 Line 234-236: I think that these alternative tests should be explained somewhere.

      We have revised the sentence so that it includes some explanations (L241). Also, we have referred to Materials and Methods.

      Comment #2-14 Figure 4: The title says ecological community compositions, and panels show the growth rates and cumulative growth.

      Thank you for pointing this out. This was a typo and we have corrected it.

      Comment #2-15 Lines 246-269: Can these expression patterns be transient and relevant to the time point that the sample is taken?

      Yes, these expression patterns were transient. We collected rice leaf samples for RNA-seq 1 day before the first manipulation and 1, 14, and 38 days after the third manipulation (see Supplementary file 3 for the sampling design). When we merged the pot locations, we observed no difference in the gene expression for samples 1 day before the first manipulation and 14 and 38 days after the third manipulation (except for two genes in samples 38 days after the manipulation), and thus, we consider the DEGs that appeared only in the short period after the manipulation. We have mentioned this in L278 and L383: "We found almost no DEGs for leaf samples taken one day before and 14 and 38 days after the third manipulation (the leaf sampling event 1, 3, and 4), suggesting that the influences of the treatments on the gene expression patterns were transient." (L278) and "These changes were observed relatively quickly and transient." (L383)

      Comment #2-16 I wonder if a conceptual framework figure would help to generalize the workflow that can be used for other studies.

      Thank you for your suggestion. Although we agree with your comment that such a figure would be helpful to generalize the workflow, we believe that our framework is clear and decided not to include it in the present manuscript. We might consider including such a figure (like Figure 1a in Ushio 2022) if we have an opportunity to write a review paper regarding this topic.

      Comment #2-17 Lines 329-335: I feel this information is unclear in the early manuscript. Maybe it's necessary to clearly communicate in the beginning.

      We have explained that we could not find any relevant information at least at the time we detected the ASVs in L189.

      Comment #2-18 Lines 336-337: Can these species be identified in the previous data set from the ASV sequences?

      Yes, these species were identified in the DNA data set obtained in 2017.

      Comment #2-19 Lines 387-397: Are there any measurements such as total biomass, and statistical methods to help with the eDNA bias and data compositionality?

      We have confirmed that our quantitative eDNA metabarcoding generates comparable results with the fluorescence-based method and quantitative PCR (e.g., see Supplementary Figures in Ushio 2022) (mentioned in L310 in the revised manuscript). However, at least in this study, we could not perform a direct comparison of the eDNA data with species abundance and/or biomass. This is partly because the number of our target species was too large (> 1,000 species). The accurate estimation of species abundance and/or biomass is one of our next goals.

      Comment #2-20 Line 472: Maybe mention transfer entropy somewhere in the early manuscript.

      We have mentioned this in L175.

      Comment #2-21 Lines 494-503: Maybe a summary of this reasoning should be mentioned somewhere in the early manuscript too.

      We have described a brief summary of the reasoning in L195.

      Comment #2-22 Lines 29-33 If this sentence is simplified it might be easier to follow.

      The sentence has been divided into two sentences in L28. Also, each sentence has been simplified.

      Comment #2-23 Line 38 Maybe "macrobes" can be explicitly mentioned. Fungi, protozoa, etc.

      Mentioned.

      Comment #2-24 Line 139: I am not sure if the date should be in the title.

      Similar monitoring was done in 2017 and 2019. Thus, we think the date is necessary in the section title.

      Comment #2-25 Figure 1: There are 4 red individuals in the design but 5 measurements in the plots.

      Heights and SPAD of the four individuals were measured for each plot and the averaged values were used as representative values for each plot. Therefore, 20 measurements (= 4 rice individuals 5 plots) were done every day, but each plot has one rice height for each day. We have clarified this in the legend of Figure 1: "the average values of the four individuals were regarded as representative values for each plot."

      Comment #2-26 Figure 1b: Maybe use the same axis length for the temperature as the other plots?

      Corrected.

      Comment #2-27 Lines 259-261: Are there the names of the genes in databases?

      Yes, these are gene names used in the rice databases (e.g., The Rice Annotation Project Database; https://rapdb.dna.affrc.go.jp/inde x.html).

      Reviewer #3 (Recommendations For The Authors):

      Comment #3-1 Additionally, RGR is not statistically significant, but statistical significance is observed only in cumulative growth because data presentation does not reflect plant characteristics. RGR changes according to the developmental stage of the plant. Therefore, if RGR data are shown separately according to the rice growing season, the cumulative growth pattern and the pattern will appear similar.

      RGRs were calculated daily (i.e., cm/day) and they changed depending on the developmental stage of the rice (Figure 1 and Figure 4–figure supplement 1). Therefore, we might find similar RGR patterns if we focus on a specific period of the growing season. However, unfortunately, we performed the intensive (i.e., daily) monitoring in 2019 only during the field manipulation period (middle June to middle July 2019), and we cannot investigate the changes in cumulative growth throughout the growing season (this depends on how many days we add up RGR to calculate the cumulative growth, though). We agree that, if we had investigated the detailed pattern of RGR throughout the growing season in 2019, we could have found similar pattens between RGR and cumulative growth rate at a certain period in the growing season. In Figure 4, the cumulative growths were calculated based on the RGRs before the third manipulation or during 10 days after the third manipulation. We clarified this in the legend of Figure 4.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This paper describes the development and initial validation of an approach-avoidance task and its relationship to anxiety. The task is a two-armed bandit where one choice is 'safer' - has no probability of punishment, delivered as an aversive sound, but also lower probability of reward - and the other choice involves a reward-punishment conflict. The authors fit a computational model of reinforcement learning to this task and found that self-reported state anxiety during the task was related to a greater likelihood of choosing the safe stimulus when the other (conflict) stimulus had a higher likelihood of punishment. Computationally, this was represented by a smaller value for the ratio of reward to punishment sensitivity in people with higher task-induced anxiety. They replicated this finding, but not another finding that this behavior was related to a measure of psychopathology (experiential avoidance), in a second sample. They also tested test-retest reliability in a sub-sample tested twice, one week apart and found that some aspects of task behavior had acceptable levels of reliability. The introduction makes a strong appeal to back-translation and computational validity, but many aspects of the rationale for this task need to be strengthened or better explained. The task design is clever and most methods are solid - it is encouraging to see attempts to validate tasks as they are developed. There are a few methodological questions and interpretation issues, but they do not affect the overall findings. The lack of replicated effects with psychopathology may mean that this task is better suited to assess state anxiety, or to serve as a foundation for additional task development.

      We thank the reviewer for their kind comments and constructive feedback. We agree that the approach taken in this paper appears better suited to state anxiety, and further work is needed to assess/improve its clinical relevance.

      Reviewer #1 (Recommendations For The Authors):

      1) For the introduction, the authors communicate well the appeal of tasks with translational potential, and setting up this translation through computational validity is a strong approach. However, I had some concerns about how the task was motivated in the introduction:

      a) The authors state that current approach-avoidance tasks used in humans do not resemble those used in the non-human literature, but do not provide details on what exactly is missing from these tasks that makes translation difficult.

      Our intention for the section that the reviewer refers to was to briefly convey that historically, approach-avoidance conflict would have been measured either using questionnaires or joystick-based tasks which have no direct non-human counterpart. However, we note that the phrasing was perhaps unfair to recent tasks that were explicitly designed to be translatable across species. Therefore, we have amended the text to the following:

      In humans, on the other hand, approach-avoidance conflict has historically been measured using questionnaires such as the Behavioural Inhibition/Activation Scale (Carver & White, 1994), or cognitive tasks that rely on motor biases, for example by using joysticks to approach/move towards positive stimuli and avoid/move away from negative stimuli, which have no direct non-human counterparts (Guitart-Masip et al., 2012; Kirlic et al., 2017; Mkrtchian et al., 2017; Phaf et al., 2014).

      b) Although back-translation to 'match' human paradigms to non-animal paradigms is useful for research, this isn't the end goal of task development. What really matters is how well these tasks, whether in humans or not, capture psychopathology-relevant behavior. Many animal paradigms were developed and brought into extensive use because they showed sensitivity to pharmacological compounds (e.g., benzodiazepines). The introduction accepts the validity of these paradigms at face value, and doesn't address whether developing human tests of psychopathology based on sensitivity to existing medication classes is the best way to generate new insights about psychopathology.

      We agree that whilst paradigms with translational and computational validity have merits of their own for neuroscientific theory, clinical validity (i.e. how well the paradigm reflects a phenomenon relevant to psychopathology) is key in the context of clinical applications. While our findings of associations between task performance and self-reported (state) anxiety suggest that our approach is a step in the right direction, the lack of associations with clinical measures was disappointing. Although future work is needed to more directly test the sensitivity of the current approach to psychopathology, this may mean that it, and its non-human counterparts, do not measure behaviours relevant to pathological anxiety. Since our primary focus in this paper was on translational and computational validity, we have opted to discuss the author’s suggestion in the ‘Discussion’ section, as follows:

      Further, it is worth noting that many animal paradigms were developed and widely adopted due to their sensitivity to anxiolytic medication (Cryan & Holmes, 2005). Given the lack of associations with clinical measures in our results, it is possible that current translational models of anxiety may not fully capture behaviours that are directly relevant to pathological anxiety. To develop translational paradigms of clinical utility, future research should place a stronger emphasis on assessing their clinical validity in humans.

      c) The authors may want to bring in the literature on the description-experience gap (e.g., PMID: 19836292) when discussing existing decision tasks and their computational dissimilarity to non-human operant conditioning tasks.

      We thank the reviewer for this useful addition to the introduction. We have now added the following to the 'Introduction’ section:

      Moreover, evidence from economic decision-making suggests that explicit offers of probabilistic outcomes can impact decision-making differently compared to when probabilistic contingencies need to be learned from experience (referred to as the ‘description-experience gap’; Hertwig & Erev, 2009); this finding raises potential concerns regarding the use of offer-based tasks in humans as approximations of non-human tasks that do not involve explicit offers.

      d) How does one evaluate how computationally similar human vs. non-human tasks are? What are the criteria for making this judgement? Specific to the current tasks, many animal learning tasks are not learning tasks in the same sense that human learning tasks are, in terms of the number of trials used and if the animals are choosing from a learned set of contingencies versus learning the contingencies during the testing.

      The computational similarity of human and non-human strategies in a given translational task can be tested empirically. This can be done by fitting models to the data and assessing whether similar models explain choices, even if parameter distributions might vary across species due to, for example, physiological differences. Indeed, non-human animals require much more training to perform even uni-dimensional reinforcement learning, but once they are trained, it should be possible to model their responses. In fact, it should even be possible to take training data into account in some cases. For example, the training phase of the Vogel/Geller-Seifter preclinical tests require an animal to learn to emit a certain action (e.g. lever press) simply to obtain some reward. In the next phase, an aversive outcome is introduced as an additional outcome, but one could model both the training and test phase together – the winning model in our studies would be a suitable candidate to model behaviour here. As we also discuss predictive validity in the ‘Discussion’ section, we opted to add the following text there too:

      … computational validity would also need to be assessed directly in non-human animals by fitting models to their behavioural data. This should be possible even in the face of different procedures across species such as number of trials or outcomes used (shock or aversive sound). We are encouraged by our finding that the winning computational model in our study relies on a relatively simple classical reinforcement learning strategy. There exist many studies showing that non-human animals rely on similar strategies during reward and punishment learning (Mobbs et al., 2020; Schultz, 2013); albeit to our knowledge this has never been modelled in non-human animals where rewards and punishment can occur simultaneously.

      2) What do the authors make of the non-linear relationship between probability of punishment and probability of choosing the conflict stimulus (Fig 2d), especially in the high task-induced anxiety participants? Did this effect show up in the replication sample as well?

      Figures 2c-e were created by binning the continuous predictors of outcome probabilities into discrete bins of equal interval. Since punishment probability varied according to Gaussian random walks, it was also distributed with more of its mass in the central region (~ 0.4), and so values at the extreme bins were estimated on fewer data and with greater variance. The non-linear relationships are likely thus an artefact of our task design and plotting procedure. The pattern was also evident in the replication sample, see Author response image 1:

      Author response image 1.

      However, since these effects were estimated as linear effects in the logistic regression models, and to avoid overfitting/interpretations of noise arising from our task design, we now plot logistic curves fitted to the raw data instead.

      3) How correlated were learning rate and sensitivity parameters? The EM algorithm used here can sometimes result in high correlations among these sets of parameters.

      As the reviewer suspects the parameters were strongly correlated, especially across the punishment-specific parameters. The Pearson’s r estimates for the untransformed parameter values were as follows:

      Reward parameters: discovery sample r = -0.39; replication sample r = -0.78

      Punishment parameters: discovery sample r = -0.91; replication sample r = -0.85

      We have included the correlation matrices of the estimated parameters as Supplementary Figure 2 in the ‘Computational modelling’ section of the Supplement.

      We have now also re-fitted the winning model using variational Bayesian inference (VBI) via Stan, and found that the cross-parameter correlations were much lower than when the data were fitted using EM. We also ran a sensitivity analysis assessing whether using VBI changed the main findings of our studies. This showed that the correlation between task-induced anxiety and the reward-punishment sensitivity index was robust to fitting method, as was the mediating effect of reward-punishment sensitivity index on anxiety’s effect on choice. This indicates that overall our key findings are robust to different methods of parameter-fitting.

      We now direct readers to these analyses from the new ‘Sensitivity analyses’ section in the manuscript, as follows:

      As our procedure for estimating model parameters (the expectation-maximisation algorithm, see ‘Methods’) produced high inter-parameter correlations in our data (Supplementary Figure 2), we also re-estimated the parameters using Stan’s variational Bayesian inference algorithm (Stan Development Team, 2023) – this resulted in lower inter-parameter correlations, but our primary computational finding, that the effect of anxiety on choice is mediated by relative sensitivity to reward/punishment was consistent across algorithms (see Supplement section 9.8 for details).

      We have included the relevant analyses comparing EM and VBI in the Supplement, as follows:

      [9.8 Sensitivity analysis: estimating parameters via expectation maximisation and variational Bayesian inference algorithms]

      Given that the expectation maximisation (EM) algorithm produced high inter-parameter correlations, we ran a sensitivity analysis by assessing the robustness of our computational findings to an alternative method of parameter estimation – (mean-field) variational Bayesian inference (VBI) via Stan (Stan Development Team, 2023). Since, unlike EM, the results of VBI are very sensitive to initial values, we fitted the data 10 times with different initial values.

      Inter-parameter correlations

      The VBI produced lower inter-parameter correlations than the EM algorithm (Supplementary Figure 8).

      Sensitivity analysis

      Since multicollinearity in the VBI-estimated parameters was lower than for EM, indicating less trade-off in the estimation, we re-tested our computational findings from the manuscript as part of a sensitivity analysis. We first assessed whether we observed the same correlations between task-induced anxiety and punishment learning, and reward-punishment sensitivity index (Supplementary Figure 9a). Punishment learning rate was not significantly associated with task-induced anxiety in any of the 10 VBI iterations in the discovery sample, although it was in 9/10 in the replication sample. On the other hand, the reward-punishment sensitivity index was significantly associated with task-induced anxiety in 9/10 VBI iterations in the discovery sample and all iterations in the replication sample. This suggests that the correlation of anxiety and sensitivity index is robust to these two fitting approaches.

      We also re-estimated the mediation models, where in the EM-estimated parameters, we found that the reward-punishment sensitivity index mediated the relationship between task-induced anxiety and task choice proportions (Supplementary Figure 9b). Again, we found that the reward-punishment sensitivity index was a significant mediator in 9/10 VBI iterations in the discovery sample and all iterations in the replication sample. Punishment learning rate was also a significant mediator in 9/10 iterations in the replication sample, although it was not in the discovery sample for all iterations, and this was not observed for the EM-estimated parameters.

      Overall, we found that our key results, that anxiety is associated with greater sensitivity to punishment over reward, and this mediates the relationship between anxiety and approach-avoidance behaviour, were robust across both fitting methods.

      As an aside, we were unable to run the model fitting using Markov chain Monte Carlo sampling approaches due to the computational power and time required for a sample of this size (Pike & Robinson, 2022, JAMA Psychiatry).

      4) What is the split-half reliability of the task parameters?

      We thank the reviewer for this query. We have now included a brief section on the (good-to-excellent) split-half reliability of the task in the manuscript:

      We assessed the split-half reliability of the task by correlating the overall proportion of conflict option choices and model parameters from the winning model across the first and second half of trials. For overall choice proportion, reliability was simply calculated via Pearson’s correlations. For the model parameters, we calculated model-derived estimates of Pearson’s r values from the parameter covariance matrix when first- and second-half parameters were estimated within a single model, following a previous approach recently shown to accurately estimate parameter reliability (Waltmann et al., 2022). We interpreted indices of reliability based on conventional values of < 0.40 as poor, 0.4 - 0.6 as fair, 0.6 - 0.75 as good, and > 0.75 as excellent reliability (Fleiss, 1986). Overall choice proportion showed good reliability (discovery sample r = 0.63; replication sample r = 0.63; Supplementary Figure 5). The model parameters showed good-to-excellent reliability (model-derived r values ranging from 0.61 to 0.85 [0.76 to 0.92 after Spearman-Brown correction]; Supplementary Figure 5).

      5) The authors do a good job of avoiding causal language when setting up the cross-sectional mediation analysis, but depart from this in the discussion (line 335). Without longitudinal data, they cannot claim that "mediation analyses revealed a mechanism of how anxiety induces avoidance".

      Thank you for spotting this, we have now amended the text to:

      … mediation analyses suggested a potential mechanism of how anxiety may induce avoidance.

      Reviewer #2 (Public Review):

      Summary:

      The authors develop a computational approach-avoidance-conflict (AAC) task, designed to overcome limitations of existing offer based AAC tasks. The task incorporated likelihoods of receiving rewards/ punishments that would be learned by the participants to ensure computational validity and estimated model parameters related to reward/punishment and task induced anxiety. Two independent samples of online participants were tested. In both samples participants who experienced greater task induced anxiety avoided choices associated with greater probability of punishment. Computational modelling revealed that this effect was explained by greater individual sensitivities to punishment relative to rewards.

      Strengths:

      Large internet-based samples, with discovery sample (n = 369), pre-registered replication sample (n = 629) and test-retest sub group (n = 57). Extensive compliance measures (e.g. audio checks) seek to improve adherence.

      There is a great need for RL tasks that model threatening outcomes rather than simply loss of reward. The main model parameters show strong effects and the additional indices with task based anxiety are a useful extension. Associations were broadly replicated across samples. Fair to excellent reliability of model parameters is encouraging and badly needed for behavioral tasks of threat sensitivity.

      We thank the reviewer for their comments and constructive feedback.

      The task seems to have lower approach bias than some other AAC tasks in the literature. Although this was inferred by looking at Fig 2 (it doesn't seem to drop below 46%) and Fig 3d seems to show quite a strong approach bias when using a reward/punishment sensitivity index. It would be good to confirm some overall stats on % of trials approached/avoided overall.

      The range of choice proportions is indeed an interesting statistic that we have now included in the manuscript:

      Across individuals, there was considerable variability in overall choice proportions (discovery sample: mean = 0.52, SD = 0.14, min/max = [0.03, 0.96]; replication sample: mean = 0.52, SD = 0.14, min/max = [0.01, 0.99]).

      Weaknesses:

      The negative reliability of punishment learning rate is concerning as this is an important outcome.

      We agree that this is a concerning finding. As reviewer 3 notes, this may have been due to participants having control over the volume used to play the aversive sounds in the task (see below for our response to this point). Future work with better controlled experimental settings will be needed to determine the reliability of this parameter more accurately.

      This may also have been due to the asymmetric nature of the task, as only one option could produce the punishment. This means that there were fewer trials on which to estimate learning about the occurrence of a punishment. Future work using continuous outcomes, as the reviewer suggests below, whilst keeping the asymmetric relationship between the options, could help in this regard.

      We have included the following comment on this issue in the manuscript:

      Alternatively, as participants self-determined the loudness of the punishments, differences in volume settings across sessions may have impacted the reliability of this parameter (and indeed punishment sensitivity). Further, the asymmetric nature of the task may have impacted our ability to estimate the punishment learning rate, as there were fewer occurrences of the punishment compared to the reward.

      The Kendall's tau values underlying task induced anxiety and safety reference/ various indices are very weak (all < 0.1), as are the mediation effects (all beta < 0.01). This should be highlighted as a limitation, although the interaction with P(punishment|conflict) does explain some of this.

      We now include references to the effect sizes to emphasise this limitation. We also note, as the reviewer suggests, that this may be due to crudeness of overall choice proportion as a measure of approach/avoidance, as it is contaminated with variables such as P(punishment|conflict).

      One potentially important limitation of our findings is the small effect size observed in the correlation between task-induced anxiety and avoidance (Kendall's tau values < 0.1, mediation betas < 0.01). This may be attributed to the simplicity of using overall choice proportion as a measure of approach/avoidance, as the effect of anxiety on choice was also influenced by punishment probability.

      The inclusion of only one level of reward (and punishment) limits the ecological validity of the sensitivity indices.

      We agree that using multi-level outcomes will be an important question for future work and now explicitly note this in the manuscript, as below:

      Using multi-level or continuous outcomes would also improve the ecological validity of the present approach and interpretation of the sensitivity parameters.

      Appraisal and impact:

      Overall this is a very strong paper, describing a novel task that could help move the field of RL forward to take account of threat processing more fully. The large sample size with discovery, replication and test-retest gives confidence in the findings. The task has good ecological validity and associations with task-based anxiety and clinical self-report demonstrate clinical relevance. The authors could give further context but test-retest of the punishment learning parameter is the only real concern. Overall this task provides an exciting new probe of reward/threat that could be used in mechanistic disease models.

      We thank the reviewer again for helping us to improve our analyses and manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Additional context:

      In the introduction "cognitive tasks that bear little semblance to those used in the non-human literature" seems a little unfair. One study that is already cited (Ironside et al, 2020) used a task that was adapted from non-human primates for use in humans. It has almost identical visual stimuli (different levels of simultaneous reward and aversive outcome/punishment) and response selection processes (joystick) between species and some overlapping brain regions were activated across species for conflict and aversiveness. The later point that non-human animals must be trained on the association between action and outcome is well taken from the point of view of computational validity but perhaps not sufficient to justify the previous statement.

      Our intention for this section was to briefly convey that historically, approach-avoidance conflict would have been measured either using questionnaires or joystick-based tasks which have no direct non-human counterpart. However, we agree that this phrasing is unfair to recent studies such as those by Ironside and colleagues. Therefore, we have amended the text to the following:

      In humans, on the other hand, approach-avoidance conflict has historically been measured using questionnaires such as the Behavioural Inhibition/Activation Scale (Carver & White, 1994), or cognitive tasks that rely on motor biases to approach/move towards positive stimuli and avoid/move away from negative stimuli which have no direct non-human counterparts (Guitart-Masip et al., 2012; Kirlic et al., 2017; Mkrtchian et al., 2017; Phaf et al., 2014).

      It would be good to speculate on why task induced anxiety made participants slower to update their estimates of punishment probability.

      Although a meta-analysis of reinforcement learning studies using reward and punishment outcomes suggests a positive association between punishment learning rate and anxiety symptoms (and depressed mood), we paradoxically found the opposite effect. However, previous work has suggested that distinct forms of anxiety associate differently with anxiety (Wise & Dolan, 2020, Nat. Commun.), where somatic anxiety was negatively correlated with punishment learning rate whereas cognitive anxiety showed the opposite effect. We have now added the following to the manuscript, and noted that future work is needed to understand the potentially complex relationship between anxiety and learning from punishments:

      Notably, although a recent computational meta-analysis of reinforcement learning studies showed that symptoms of anxiety and depression are associated with elevated punishment learning rates (Pike & Robinson, 2022), we did not observe this pattern in our data. Indeed, we even found the contrary effect in relation to task-induced anxiety, specifically that anxiety was associated with lower rates of learning from punishment. However, other work has suggested that the direction of this effect can depend on the form of anxiety, where cognitive anxiety may be associated with elevated learning rates, but somatic anxiety may show the opposite pattern (Wise & Dolan, 2020) and this may explain the discrepancy in findings. Additionally, parameter values are highly dependent on task design (Eckstein et al., 2022), and study designs to date may be more optimised in detecting differences in learning rate (Pike & Robinson, 2022) – future work is needed to better understand the potentially complex association between anxiety and punishment learning rate. Lastly, as punishment learning rate was severely unreliable in the test-retest analyses, and the associations between punishment learning rate and state anxiety were not robust to an alternative method of parameter estimation (variational Bayesian inference), the negative correlation observed in our study should be treated with caution.

      Were those with more task-based anxiety more inflexible in general?

      The lack of associations across reward learning rate and task-induced anxiety suggest that this was not a general inflexibility effect. To test the reviewer’s hypothesis more directly, we conducted a sensitivity analysis by examining the model with a general learning rate – this did not support a general inflexibility effect. Please see the new section in the Supplement below:

      [9.10 Sensitivity analysis: anxiety and inflexibility]

      As anxious participants were slower to update their estimates of punishment probability, we determined whether this was due to greater general inflexibility by examining the model including two sensitivity parameters, but one general learning rate (i.e. not split by outcome). The correlation between this general learning rate and task-induced anxiety was not significant in either samples (discovery: tau = -0.02, p = 0.504; replication: tau = -0.01, p = 0.625), suggesting that the effect is specific to punishment.

      Was the 16% versus 20% of the two samples with clinically relevant anxiety symptoms significantly different? What about other demographics in the two samples?

      The difference in proportions were not significantly different (χ2 = 2.33, p = 0.127). The discovery sample included more females and was older on average compared to the replication sample – information which we now report in the manuscript:

      The discovery sample consisted of a significantly greater proportion of female participants than the replication sample (59% vs 52%, χ2 = 4.64, p = 0.031). The average age was significantly different across samples (discovery sample mean = 37.7, SD = 10.3, replication sample mean = 34.3, SD = 10.4; t785.5 = 5.06, p < 0.001). The differences in self-reported psychiatric symptoms across samples did not reach significance (p > 0.086).

      It would be interesting to know how many participants failed the audio attention checks.

      We have now included information about what proportion of participants fail each of the task exclusion criteria in the manuscript:

      Firstly, we excluded participants who missed a response to more than one auditory attention check (see above; 8% in both discovery and replication samples) – as these occurred infrequently and the stimuli used for the checks were played at relatively low volume, we allowed for incorrect responses so long as a response was made. Secondly, we excluded those who responded with the same response key on 20 or more consecutive trials (> 10% of all trials; 4/6% in discovery and replication samples, respectively). Lastly, we excluded those who did not respond on 20 or more trials (1/2% in discovery and replication samples, respectively). Overall, we excluded 51 out of 423 (12%) in the discovery sample, and 98 out of 725 (14%) in the replication sample.

      There doesn't appear to be a model with only learning from punishment (i.e. no reward learning) included in the model comparison. It would be interesting to see how it compared.

      We have fitted the suggested model and found that it is the least parsimonious of the models. Since participants were monetarily incentivised based on the rewards only, this was to be expected. We have now added this ‘punishment learning only’ model and its variant including a lapse term into the model comparison. The two lowest bars on the y-axis in Author response image 2 represent these models.

      Author response image 2.

      Were sex effects examined as these have been commonly found in AAC tasks. How about other covariates such as age?

      We have now tested the effects of sex and age on behaviour and on parameter values. There were indeed some significant effects, albeit with some inconsistencies across the two samples, which for completeness we have included in the manuscript, as follows:

      While sex was significantly associated with choice in the discovery sample (β = 0.16 ± 0.07, p = 0.028) with males being more likely to choose the conflict option, this pattern was not evident in the replication sample (β = 0.08 ± 0.06, p = 0.173), and age was not associated with choice in either sample (p > 0.2).

      Comparing parameters across sexes via Welch’s t-tests revealed significant differences in reward sensitivity (t289 = -2.87, p = 0.004, d = 0.34; lower in females) and consequently reward-punishment sensitivity index (t336 = -2.03, p = 0.043, d = 0.22; lower in females i.e. more avoidance-driven). In the replication sample, we observed the same effect on reward-punishment sensitivity index (t626 = -2.79, p = 0.005, d = 0.22; lower in females). However, the sex difference in reward sensitivity did not replicate (p = 0.441), although we did observe a significant sex difference in punishment sensitivity in the replication sample (t626 = 2.26, p = 0.024, d = 0.18).

      Minor: Still a few placeholders (Supplementary Table X/ Table X) in the methods

      We thank the reviewer for spotting these errors. We have now corrected these references.

      Reviewer #3 (Public Review):

      This study investigated cognitive mechanisms underlying approach-avoidance behavior using a novel reinforcement learning task and computational modelling. Participants could select a risky "conflict" option (latent, fluctuating probabilities of monetary reward and/or unpleasant sound [punishment]) or a safe option (separate, generally lower probability of reward). Overall, participant choices were skewed towards more rewarded options, but were also repelled by increasing probability of punishment. Individual patterns of behavior were well-captured by a reinforcement learning model that included parameters for reward and punishment sensitivity, and learning rates for reward and punishment. This is a nice replication of existing findings suggesting reward and punishment have opposing effects on behavior through dissociated sensitivity to reward versus punishment.

      Interestingly, avoidance of the conflict option was predicted by self-reported task-induced anxiety. This effect of anxiety was mediated by the difference in modelled sensitivity to reward versus punishment (relative sensitivity). Importantly, when a subset of participants were retested over 1 week later, most behavioral tendencies and model parameters were recapitulated, suggesting the task may capture stable traits relevant to approach-avoidance decision-making.

      We thank the reviewer for their useful analysis of our study. Indeed, it was reassuring to see that performance indices were reliable across time.

      However, interpretation of these findings are severely undermined by the fact that the aversiveness of the auditory punisher was largely determined by participants, with the far-reaching impacts of this not being accounted for in any of the analyses. The manipulation check to confirm participants did not mute their sound is highly commendable, but the thresholding of punisher volume to "loud but comfortable" at the outset of the task leaves substantial scope for variability in the punisher delivered to participants. Indeed, participants' ratings of the unpleasantness of the punishment was moderate and highly variable (M = 31.7 out of 50, SD = 12.8 [distribution unreported]). Despite having this rating, it is not incorporated into analyses. It is possible that the key finding of relationships between task-induced anxiety, reward-punishment sensitivity and avoidance are driven by differences in the punisher experienced; a louder punisher is more unpleasant, driving greater task-induced anxiety, model-derived punishment sensitivity, and avoidance (and vice versa). This issue can also explain the counterintuitive findings from re-tested participants; lower/negatively correlated task-induced anxiety and punishment-related cognitive parameters may have been due to participants adjusting their sound settings to make the task less aversive (retest punisher rating not reported). It can therefore be argued that the task may not actually capture meaningful cognitive/motivational traits and their effects on decision-making, but instead spurious differences in punisher intensity.

      We thank the reviewer for raising this important potential limitation of our study. We agree that how participants self-adjusted their sound volume may important consequences for our interpretations of the data. Unfortunately, despite the scalability of online data collection, this highlights one of its major weaknesses in the lack of controllability over experimental parameters. The previous paper from which we obtained our aversive sounds (Seow & Hauser, 2021, Behav Res, doi.org/10.3758/s13428-021-01643-0) contains useful analyses with regards to this discussion. When comparing the unpleasantness of the sounds played at 50% vs 100% volume, the authors indeed found that the lower volumes lead to lower unpleasantness ratings. However, the magnitude of this effect did not appear to be substantial (Fig. 4 from the paper), and even at 50% volume, the scream sounds we used were rated in the top quartile for unpleasantness, on average. This implies that the sounds have sufficient inherent unpleasantness, even when played at half intensity. We find this reassuring, in the sense that any self-imposed volume effects may not be large. Of note, our instructions to participants to adjust the volume to a ‘loud but comfortable’ level was based on the same phrasing used in this study.

      To the reviewers point on how this might affect the reliability of the task, we have included the following in the ‘Discussion’ section:

      Alternatively, as participants self-determined the loudness of the punishments, differences in volume settings across sessions may have impacted the reliability of this parameter (and indeed other measures).

      Please see below for analyses accounting for punishment unpleasantness ratings.

      This undercuts the proposed significance of this task as a translational tool for understanding anxiety and avoidance. More information about ratings of punisher unpleasantness and its relationship to task behavior, anxiety and cognitive parameters would be valuable for interpreting findings. It would also be of interest whether the same results were observed if the aversiveness of the punisher was titrated prior to the task.

      As suggested, we have now included sensitivity analyses using the unpleasantness ratings that show their effect is minimal on our primary inference. We report relevant results below in the ‘Recommendations For The Authors’ section. At the same time, we think it is important to acknowledge that unpleasantness is a combination of both the inherent unpleasantness of the sound and the volume it is presented at, where only the latter is controlled by the participant. Therefore, these analyses are not a perfect indicator of the effect of participant control. For convenience, we reproduce the key findings from this sensitivity analysis here:

      Approach-avoidance hierarchical logistic regression model

      We assessed whether approach and avoidance responses, and their relationships with state anxiety, were impacted by punishment unpleasantness, by including unpleasantness ratings as a covariate into the hierarchical logistic regression model. Whilst unpleasantness was a significant predictor of choice (positively predicting safe option choices), all significant predictors and interaction effects from the model without unpleasantness survived (Supplementary Figure 11). Critically, this suggests that punishment unpleasantness does not account for all of the variance in the relationship between anxiety and avoidance.

      Mediation model

      When unpleasantness ratings were included in the mediation models, the mediating effect of the reward-punishment sensitivity index did not survive (discovery sample: standardised β = 0.003 ± 0.003, p = 0.416; replication sample: standardised β = 0.004 ± 0.003, p = 0.100; Supplementary Figure 12). Pooling the samples resulted in an effect that narrowly missed the significance threshold (standardised β = 0.004 ± 0.002, p = 0.068).

      More generally, whether or not to titrate the punishments (and indeed the rewards) is an interesting experimental decision, which we think should be guided by the research question. In our case, we were interested in individual differences in reward/punishment learning and sensitivity and their relation to anxiety, so variation in how aversive the sounds affected approach-avoidance decisions was an important aspect of our design. In studies where the aim is to understand more general processes of how humans act under approach-avoidance conflict, it may be better to tightly control the salience of reinforcers.

      Ultimately, the best test of the causal role of anxiety on avoidance, and against the hypothesis that our results were driven by spurious volume control effects, would be to run within-subjects anxiety interventions, where these volume effects are naturally accounted for. This will be an important direction for future studies using similar measures. We have added a paragraph in the ‘Discussion’ section on this point:

      Relatedly, participants had some control over the intensity at which the punishments were presented, which may have driven our findings relating to anxiety and putative mechanisms of anxiety-related avoidance. Sensitivity analyses showed that our finding that anxiety is positively associated with avoidance in the task was robust to individual differences in self-reported punishment unpleasantness, whilst the mediation effects were not. Future work imposing better control over the stimuli presented, and/or using within-subjects designs will be needed to validate the role of reward/punishment sensitivities in anxiety-related avoidance.

      Although the procedure and findings reported here remain valuable to the field, claims of novelty including its translational potential are perhaps overstated. This study complements and sits within a much broader literature that investigates roles for aversion and cognitive traits in approach-avoidance decisions. This includes numerous studies that apply reinforcement learning models to behavior in two-choice tasks with latent probabilities of reward and punishment (e.g., see doi: 10.1001/jamapsychiatry.2022.0051), as well as other translationally-relevant paradigms (e.g., doi: 10.3389/fpsyg.2014.00203, 10.7554/eLife.69594, etc).

      We agree with the reviewer that our approach builds on previous work in reinforcement learning, approach-avoidance conflict and translational measures of anxiety. Whilst there are by now many studies using two-choice learning tasks with latent reward and punishment probabilities, our main, and which we refer to as ‘novel’, aim was to bring these fields together in such a way so as to model anxiety-related behaviour.

      We note that we do not make strong statements about whether these effects speak to traits per se, and as Reviewer 1 notes, the evidence from our study suggests that the present measure may be better suited to assessing state anxiety. While computational model parameters can and are certainly often interpreted as constituting stable individual traits, a more simple interpretation of our findings may be that state anxiety is associated with a momentary preference for punishment avoidance over reward pursuit. This can still be informative for the study of anxiety, especially given the notion of a continuous relationship between adaptive/state anxiety and maladaptive/persistent anxiety.

      Having said that, we agree with the underlying premise of the reviewer’s point that how the measure relates to trait-level avoidance/inhibition measures will be an interesting question for future work. We appreciate the importance of using tasks such as ours and those highlighted by the reviewer as trait-level measures, especially in computational psychiatry. We have now included a discussion on the potential roles of cognitive/motivational traits, in line with the reviewer’s recommendation – briefly, we have included the suggested references by the reviewer, discussed the measure’s potential relevance to cognitive/motivational traits, and direct interested readers to the broader literature. Please see below for details.

      Reviewer #3 (Recommendations For The Authors):

      As stated in the public review, punisher unpleasantness and its relationship to key findings (including for retest) should be reported and discussed.

      We signpost readers to our new analyses, incorporating unpleasantness ratings into the statistical models, from the main manuscript as follows:

      Since participants self-determined the volume of the punishments in the task, and therefore (at least in part) their aversiveness, we conducted sensitivity analyses by accounting for self-reported unpleasantness ratings of the punishment (see the Supplement). Our finding that anxiety impacts approach-avoidance behaviour was robust to this sensitivity analysis (p < 0.001), however the mediating effect of the reward-sensitivity sensitivity index was not (p > 0.1; see Supplement section 9.9 for details).

      We reproduce the relevant section from the Supplement below. Overall, we found that the effect of anxiety on choices (via its interaction with punishment probability) remained significant after accounting for unpleasantness, however the mediating effect of reward-punishment sensitivity was no longer significant when unpleasantness ratings were included in the model. As noted above, unpleasantness ratings are not a perfect measure of self-imposed sound volume, and indeed punishment sensitivity is essentially a computationally-derived measure of unpleasantness, which makes it difficult to interpret the mediation model which contains both of these measures. However, since we found that anxiety affected choice over and above and effects of self-imposed sound volume (using unpleasantness ratings as a proxy measure), we argue that the task still holds value as a model of anxiety-related avoidance.

      [Supplement Section 9.9: Sensitivity analyses of punishment unpleasantness]

      Distribution of unpleasantness

      The punishments were rated as unpleasant by the participants, on average (discovery sample: mean rating = 31.1 [scored between 0 and 50], SD = 13.1; replication sample: mean rating = 32.1, SD = 12.7; Supplementary Figure 10).

      Approach-avoidance hierarchical logistic regression model

      We assessed whether approach and avoidance responses, and their relationships with state anxiety, were impacted by punishment unpleasantness, by including unpleasantness ratings as a covariate into the hierarchical logistic regression model. Whilst unpleasantness was a significant predictor of choice (positively predicting safe option choices), all significant predictors and interaction effects from the model without unpleasantness ratings survived (Supplementary Figure 11). Critically, this suggests that punishment unpleasantness does not account for all of the variance in the relationship between anxiety and avoidance.

      Mediation model

      When unpleasantness ratings were included in the mediation models, the mediating effect of the reward-punishment sensitivity index did not survive (discovery sample: standardised β = 0.003 ± 0.003, p = 0.416; replication sample: standardised β = 0.004 ± 0.003, p = 0.100; Supplementary Figure 12). Pooling the samples resulted in an effect that narrowly missed the significance threshold (standardised β = 0.004 ± 0.002, p = 0.068).

      Test-retest reliability of unpleasantness

      The test-retest reliability of unpleasantness ratings was excellent (ICC(3,1) = 0.75), although participants gave significantly lower ratings in the second session (t56 = 2.7, p = 0.008, d = 0.37; mean difference of 3.12, SD = 8.63).

      Reliability of other measures with/out unpleasantness

      To assess the effect of accounting for unpleasantness ratings on reliability estimates of task performance, we extracted variance components from linear mixed models, following a standard approach (Nakagawa et al., 2017) – note that this was not the method used to estimate reliability values in the main analyses, but we used this specific approach to compare the reliability values with and without the covariate of unpleasantness ratings. The results indicated that unpleasantness ratings did not have a material effect on reliability (Supplementary Figure 14).

      We discuss the findings of these sensitivity analyses in the ‘Discussion’ section, as follows:

      Relatedly, participants had some control over the intensity at which the punishments were presented, which may have driven our findings relating to anxiety and putative mechanisms of anxiety-related avoidance. Sensitivity analyses showed that our finding that anxiety is positively associated with avoidance in the task was robust to individual differences in self-reported punishment unpleasantness, whilst the mediation effects were not. Future work imposing better control over the stimuli presented, and/or using within-subjects designs will be needed to validate the role of reward/punishment sensitivities in anxiety-related avoidance.

      Introduction and discussion should spend more time relating the task and current findings to existing procedures and findings examining individual differences in avoidance and cognitive/motivational correlates.

      We thank the reviewer for the opportunity to expand on the literature. Whilst there are numerous behavioural paradigms in both the human and non-human literature that involve learning about rewards and punishments, our starting point for the introduction was the state-of-the-art in translational models of approach-avoidance conflict models of anxiety. Therefore, for the sake of brevity and logical flow of our introduction, we have opted to bring in the discussion on other procedures primarily in the ‘Discussion’ section of the manuscript.

      We have now included the reviewer’s suggested citations from their ‘Public Review’ as follows:

      Since we developed our task with the primary focus on translational validity, its design diverges from other reinforcement learning tasks that involve reward and punishment outcomes (Pike & Robinson, 2022). One important difference is that we used distinct reinforcers as our reward and punishment outcomes, compared to many studies which use monetary outcomes for both (e.g. earning and losing £1 constitute the reward and punishment, respectively; Aylward et al., 2019; Jean-Richard-Dit-Bressel et al., 2021; Pizzagalli et al., 2005; Sharp et al., 2022). Other tasks have been used that induce a conflict between value and motor biases, relying on prepotent biases to approach/move towards rewards and withdraw from punishments, which makes it difficult to approach punishments and withdraw from rewards (Guitart-Masip et al., 2012; Mkrtchian et al., 2017). However, since translational operant conflict tasks typically induce a conflict between different types of outcome (e.g. food and shocks/sugar and quinine pellets; Oberrauch et al., 2019; van den Bos et al., 2014), we felt it was important to implement this feature. One study used monetary rewards and shock-based punishments, but also included four options for participants to choose from on each trial, with rewards and punishments associated with all four options (Seymour et al., 2012). This effectively requires participants to maintain eight probability estimates (i.e. reward and punishment at each of the four options) to solve the task, which may be too difficult for non-human animals to learn efficiently.

      We have also included a discussion on the measure’s potential relevance to cognitive/motivational traits as follows:

      Finally, whilst there is a broad literature on the roles of behavioural inhibition and avoidance tendency traits on decision-making and behaviour (Carver & White, 1994; Corr, 2004; Gray, 1982), we did not replicate the correlation of experiential avoidance and avoidance responses or the reward-punishment sensitivity index. Since there were also no significant correlations across task performance indices and clinical symptom measures, our findings suggest that the measure may be more sensitive to behaviours relating to state anxiety, rather more stable traits. Nevertheless, how performance in the present task relates to other traits such as behavioural approach/inhibition tendencies (Carver & White, 1994), as has been found in previous studies on reward/punishment learning (Sharp et al., 2022; Wise & Dolan, 2020) and approach-avoidance conflict (Aupperle et al., 2011), will be an important question for future work.

      We also now direct readers to a recent, comprehensive review on applying computational methods to approach-avoidance behaviours in the ‘Introduction’ section:

      A fundamental premise of this approach is that the brain acts as an information-processing organ that performs computations responsible for observable behaviours, including approach and avoidance (for a recent review on the application of computational methods to approach-avoidance conflict, see Letkiewicz et al., 2023).

      I am curious why participants were excluded if they made the same response on 20+ consecutive trials. How does this represent a cut-off between valid versus invalid behavioral profiles?

      We apologise for the lack of clarity on this point in our original submission – this exclusion criterion was specifically if participants used the same response key (e.g. the left arrow button) on 20 or more consecutive trials, indicating inattention. Since the left-right positions of the stimuli were randomised across trials, this did not exclude participants who repeatedly chose the same option frequently. However, as we show in the Supplement, this, along with the other exclusion criteria, did not affect our main findings.

      We have now clarified this as follows:

      … we excluded those who responded with the same response key on 20 or more consecutive trials (> 10% of all trials; 4%/6% in discovery and replication samples, respectively) – note that as the options randomly switched sides on the screen across trials, this did not exclude participants who frequently and consecutively chose a certain option.

    1. Author Response

      The following is the authors’ response to the previous reviews

      Reviewer # 1 (Public Review)

      Specific comments

      1) For all cell-based assays using shRNA to knock down CRB3, it would be desirable to perform rescue experiments to ensure that the observed phenotype of CRB3 depleted cells is specific and not due to off-target effects of the shRNA.

      Thank you for your comments. Based on your suggestions, we performed the rescue experiments to observe any alterations in the primary cilia of CRB3-depleted MCF10A cells with overexpressed CRB3. The revised parts can be found in lines 186-188 and the new Supplementary Figure 3A-C has been added.

      2) Figure 3G: it is very difficult to see that the red stained structures are primary cilia.

      Yes, the staining structure of primary cilia in mammary ductal lumen are less clear than that of individual cells and in renal tubule in Figure 3G. We used recognized acetylated tubulin and γ-tubulin to stain the primary cilia, which were clearly labeled in individual cells. However, the labeled primary cilia in renal tubule were longer length and demonstrated a more pronounced structure than those in the mammary ductal lumen. In the mammary ductal lumen of the 10 mice we analyzed, the primary cilia showed shorter length and staining structure than the others shown in Figure 3G. This difference may be due to the distinct characteristics of primary cilia in different tissues.

      3) Figure 5A: it is unfortunate the authors chose not to show the original dataset (Excel file) used for generating this figure; this makes it difficult to interpret the data. It is general policy of the journal to make source data accessible to the scientific community.

      In accordance with the journal policy, we have provided the original dataset (Excel file) for Figure 5A, as detailed in “Figure 5–Source Data 1”.

      4) The authors have a tendency to overinterpret their data, and not all claims put forth by the authors are fully supported by the data provided.

      We have carefully read through the whole text and have revised the overinterpretation parts. These parts can be found in lines 48-50, lines 93-95, and lines 260-261.

      Reviewer # 2 (Public Review)

      Thank you for recognizing and supporting our research for this manuscript.  

      Reviewer # 1 (Recommendations For The Authors)

      1) Abstract line 48-51: data overinterpretation. The authors cannot claim this based on the data they are presenting. Please modify the statement/temper the claims.

      Thanks for your comments. We have revised this sentence in the abstract, as well as lines 48-50 for details.

      2) There are several grammatical errors throughout the manuscript. In particular, the following sentences/statements are either wrong, confusing or non-sensical: lines 55-56; lines 87-90; lines 93-95; lines 385-387; lines 409-410.

      Thanks for your positive comments. We have modified lines 55-56 to become new lines 54-55. These sentences in lines 87-90 and lines 93-95 are difficult to understand and logically problematic, so we have carefully revised this paragraph (new lines 85-90). Lines 385-387 have been deleted as they are non-sensical. Lines 409-410 contain misrepresentations. We have revised them in new lines 408-409.

      3) Lines 257-259: this is data over-interpretation. It is not correct to state CRB3 is highly dynamic without having done any live cell imaging.

      Thank you for your comments. We have revised this sentence, see revised lines 260-261 for details.

      4) Figure 8E: if cells do not make cilia when CRB3 is lost (Figure 3), how is it possible to analyze SMO localization to cilia in these cells?

      Thank you for your comments. We used immunofluorescence techniques, with acetylated tubulin and SMO co-staining, to analyze the localization of SMO to cilia. The results of immunofluorescent staining of primary cilium and statistical analysis in Figure 3 showed that the proportion of cells with primary cilium was significantly lower in the CRB3 knockdown group, but cells with primary cilium were still present. We used laser confocal microscopy micrographs to identify cells with primary cilium by staining acetylated tubulin, then analyzed the co-localization under the SMO channel, and finally analyzed the proportion of SMO-positive cilia. Several publications (J Cell Biol. 2020;219(6):e201904107; Science. 2008;320(5884):1777-81; Proc Natl Acad Sci U S A. 2012;109(34):13644-9.) have demonstrated that knocking down genes can affect primary cilium formation, and this method has also been used to examine the localization of SMO-related signaling pathway molecules on primary cilium.

      5) Lines 366-366: based on the relative low magnification of the images in Figure 8H it is difficult to assess the subcellular localization of GLI1 and whether there is a difference between wild type and the Crb3 mutant cells. For example, it is not clear if GLI1 is localizing to the centrosome-cilium axis. Please modify the text accordingly.

      Thank you for your good suggestions. As you mentioned, IHC cannot observe the subcellular localization of GLI1 on the centrosome-cilium axis. However, since GLI1 is a transcriptional effector at the terminal end of the Hh signaling pathway, we may not have made it clear that what we observed in the IHC results was the localization of GLI1 in the nucleus. Therefore, we have revised the description accordingly, as described in line 368 and lines 520-521.

      6) Figure 7D, E: the zoomed-in images look pixelated.

      Thank you for your positive comments. We have replaced these images in the new Figure 7D and E.

      7) Figure 8B: Acetylacte-tub is misspelled.

      Thank you for your comments. We have revised and standardized the acetylated tubulin stain to "Ace-tubulin" in all immunofluorescent images throughout the manuscript.

      Reviewer # 2 (Recommendations For The Authors)

      1) 1) CRB3 is present in mammals as 2 isoforms, A and B, originating from an alternative splicing. In this study, the authors never mention this fact and when using approaches to KO or KD CRB3A/B they are likely to deplete both isoforms which have been shown to have different C-terminal domains and functions (Fan et al., 2007). This is also important for the CRB3 antibodies used in the study since according to the material and methods section they are either against the extracellular domain common to both isoforms or the intracellular domain which is only similar in the domain close to transmembrane between the 2 isoforms. Since the antibodies used in each figure are not detailed it is impossible to know if the authors are detecting CRB3A or B or both. Please provide the information and correct for the actual isoform detected in the data and conclusions.

      From the revised version we know now that CRB3B is used for exogenous expression. It has been shown that each isoform has a different role and localization in cells so why focus only on CRB3B for this study?

      Thank you for your positive comments. First, previous literature has reported that CRB3b localizes in the primary cilium of MDCK cells. We have corrected the Introduction to specify CRB3b (line 81). Secondly, in the methodology section, we show that the CDs sequence of CRB3b was PCR-amplified from RNA extracted from MCF10A cells. We also designed primers specific to CRB3a but were unable to amplify them, indicating that CRB3b is significantly more expressed in epithelial cells than CRB3a. Finally, according to the company recommended by Genecards website for purchasing CRB3 cloning products, the only CRB3 sequence available in the CRB3 cDNA ORF Clone in Cloning Vector, Human (Cat: HG14324-G) from Sino Biological is CRB3b.

      2) 3) The authors use GFP-CRB3A/B, it is not stated which isoform, over-expression to localize CRB3A/B in MCF10A cells (figure 4A). The levels of expression appear to be very high in the GFP panel and it is likely that the secretory pathway of the cells is clogged with GFP-CRB3A/B in transit from the ER to the plasma membrane. Thus, the colocalization with pericentrin might be due to the accumulation of ER and Golgi around the centrosome. This colocalization should be done with the endogenous CRB3A/B and with a better resolution.

      The authors do not answer about the potential mislocalization of overexpressed exogenous protein.

      We acknowledge the reviewer's perspective. The large amount of exogenous protein overexpression in the cell could potentially obstruct the protein secretion pathway, resulting in the accumulation of the exogenous protein at the ER and Golgi. Such accumulation could create the false impression of co-localization between CRB3b and the centrosome. To provide additional details (lines 215-217 and lines 426-433), we re-expressed the results exogenously and subsequently used staining of endogenous CRB3 and γ-tubulin in Fig. 4C to confirm the co-localization of CRB3 and the centrosome.

      3) 4) The staining for CRB3A/B in Figure 4C (red) is striking with a very strong accumulation in an undefined intracellular structure and the authors do not provide any explanation for such a difference with the GFP-CRB3A/B just above.

      The authors explain that two different photonic techniques are used (classical versus confocal) but in a cell biology manuscript confocal microscopy is now the standard technique.

      Thank you for your comments. We have included a discussion on the partial concordance between CRB3's endogenous staining and exogenous expression results in the "Discussion" section, specifically in lines 420-435.

      4) 7) In addition, the authors claim (Line 251/252) that Rab11 is necessary for the transport of CRB3A/B but they should KD Rab11 to show this.

      The author's answer is that blocking endocytosis with dynasore is as good as knocking down Rab11 to show its interaction and role in CRB3A/B transport which is not the case.

      Thank you for your comments. As requested by the reviewers, we have conducted experiments to knockdown Rab11 and detect CRB3 intracellular trafficking, as shown in the new Supplementary Figure 5B and added lines 258-260. These results provide additional support for our conclusions.

      5) 8) The domain of CRB3A/B that is necessary for the interaction with Rab11 is the N-terminal part of the extracellular domain. This domain is thus inside the transport vesicles and not accessible from the cytoplasm. Given that Rab11 is a cytoplasmic protein, how the 2 proteins could interact across the membrane? The authors do not even discuss this essential point for their hypothesis. Comment on the revised version: the authors still do not understand the basic of cell biology since they claim that the extracellular domain of CRB3 can be in contact with Rab11 after endocytosis. Even after endocytosis the extracellular domain of CRB3A/B is inside the lumen of the endosome and not in contact with the cytosol where Rab11 is located. Lines 420-421 of the revised manuscript still claim this interaction between the two proteins without providing the link between the cytosol where Rab11 is and the endosome lumen where the extracellular domain of CRB3A/B is. Please correct.

      Thank you for your positive comments. After carefully studying the relevant knowledge, we strongly agree with the reviewer's point of view. We have toned down our claim and removed the description regarding the binding of Rab11 endosomes to specific structural domains of intracellular CRB3 that we were unable to confirm (see lines 443-444 and lines 465-466).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors report a new bioinformatics pipeline ("SPICE") to predict pairwise cooperative binding-sites based on input ChIP-seq data for transcription factor (TF)-of-interest, analyzed against DNA-binding sites (DNA motifs) in a database (HOCOMOCO). The pipeline also predicts the optimal distance between the paired binding sites. The pipeline correctly predicted known/reported transcription factor cooperations, and also predicted new cooperations, not yet reported in literature. The authors choose to follow up on the predicted interaction between Ikaros and Jun. Using ChIP-seq in mouse B cells, they show extensive overlap in binding regions between Ikaros and Jun in LPS+IL21 stimulated cells. In a human B-lineage cell line (MINO) they show that anti-Ikaros Ab can co-immunoprecipitate Jun protein, and that the MINO cell extracts contain protein(s) that can bind to the CNS9 probe (conserved region upstream of IL10 gene), and that binding is lost upon mutation of two basepairs in the AP1 binding motif, and reduced upon mutation of two basepairs in the non-canonical Ikaros binding motif. Part of this protein complex is super-shifted with an anti-Jun antibody, and more DNA is shifted with addition of an anti-Ikaros antibody.

      The authors perform EMSA showing that recombinant Jun can bind to the tested DNA-region (IL10 CNS9) and that addition of recombinant Ikaros (or anti-Ikaros antibody in Fig 3E) can enhance binding (increase amount of DNA shifted). The authors lastly show that the IL10 CNS9 DNA region can enhance transcription in B- and T-cells with a luciferase reporter assay, and that 2 bp mutation of the Ikaros or Jun DNA motifs greatly reduce or abolish this activity.

      This is interesting work, with two main contributions: The SPICE pipeline (if made available to the scientific community), and the report of interaction between Ikaros and Jun. However, the distinction between DNA motifs, and the proteins actually binding and having a biological function, should be made clear consistently throughout the manuscript. The same DNA motifs can be bound by multiple factors, for instance within transcription factor families with highly homology in the DNA-binding regions of the proteins.

      The reviewer has correctly assessed the content of our manuscript.

      Some specific points:

      SPICE: It is unclear if this is uploaded somewhere to be available to the scientific community.

      Thanks for this comment. We will upload the SPICE pipeline and its associated scripts (R and shell) via GitHub.

      It was unclear if Ikaros-Jun interaction was initially found from primary Jun ChIP-seq (and secondary Ikaros motif from HOCOMOCO) or from primary Ikaros CHIP-seq (and secondary Jun motif from HOCOMOCO). And - what were the two DNA motifs (primary and secondary, and their distance) from the SPICE analysis?

      The IKZF1-JUN interaction was found from primary JUN ChIP-seq data and searching for secondary IKZF1 motifs identified in the HOCOMOCO database. We will provide the primary and secondary motifs in our revised manuscript.

      Authors have mostly careful considerations and statements. One additional comment is that binding does not equal function (Fig 2D), and that opening of chromatin (by any other factor(s)) can give DNA-binding factors (like Ikaros and Jun) the opportunity to bind, without functional consequence for the biological process studied.

      We appreciate that the reviewer believes our considerations and statement are careful. We agree that opening of chromatin can give the opportunity of factors to bind, and we now make this point in the manuscript.

      Figure 2E: Ikaros is reported to be expressed at baseline in murine B cells, yet the Ikaros ChIP-seq in unstimulated cells had what looks to be no significant or low peaks. LPS stimulation induced strong Ikaros ChIP-seq signal. A western blot showing the Ikaros protein levels in the 3 conditions could help understand if the binding pattern is due to protein expression level induction. Similar for Jun (western in the 3 conditions), which seemed to mainly bind in the LPS+IL21 condition. Furthermore, as also suggested below, tracks showing Ikaros and Jun binding from all conditions (unstimulated, LPS only and LPS+IL21 stimulated cells), at select genomic loci, would be helpful in illustrating this difference in signal between the different cell conditions. This is relevant in regards to the point of cooperativity of binding.

      The main point of the paper was showing functional cooperation and proximity of binding. However, the use of purified JUN and Ikaros protein suggest cooperative binding. Exhaustive evaluation of the JUN-Ikaros association is left for future studies.

      ChIP-seq in mouse B cells showed that Ikaros bound strongly in LPS stimulated cells, in the (relative) absence of Jun binding (Fig. 2C). However, in EMSA (Fig 3C), there is no binding when the AP1 site is mutated, and the authors describe this as Ikaros binding site. What does the Ikaros binding look like at this genomic location in LPS (only) stimulated cells? The authors could show the same figure as in Fig 2F but show Ikaros and Jun ChIP-seq tracks at IL10 CNS9 locus from all conditions to compare binding in unstimulated, LPS and LPS+IL21 cells.

      As requested, we now show Ikaros and Jun ChIP-seq tracks from unstimulated, LPS-treated, and LPS + IL21-treated cells. Both Ikaros and cJUN were bound to the Il10 upstream CNS9 region with LPS treatment of cells (see Author response image 1, highlighted in red box), but binding was weaker than that observed with LPS + IL21.

      Author response image 1.

      Also: How does this reconcile with the luciferase assay in Fig 4E, where LPS (only) stimulation is used, which in Fig 2E only/mainly induced Ikaros, and not Jun ChIP-seq signal (while EMSA indicate Ikaros cannot bind the site alone, but can enhance Jun-dependent binding).

      As shown above, in the LPS (only) condition, both IKZF1 (Ikaros) and cJUN bind to Il10 CNS9 locus. Thus, this is not in conflict with our luciferase assay data in Fig. 4E, which showed Ikaros is dependent on AP-1 binding. Moreover, the AP-1 site in Fig. 4D and 4E can be bound by other AP-1 factors as well, such as JUND, JUNB, BATF, etc. These points can be made in the manuscript. These factors potentially can compete with cJUN binding and their roles remain to be explored.

      Comment on statements in results section: The luciferase assays in B and T cells do not demonstrate the role of the proteins Ikaros or Jun directly (page 10, lines 208 and surrounding text). The assay measures an effect of the DNA sequences (implying binding of some transcription factor(s)), but does not identify which protein factors bind there.

      We agree with the reviewer. It is reasonable and even likely that different family members may be partially redundant. This point is now made on our revised manuscript.

      Lastly, the authors only discuss Ikaros (using the term IKZF1 which is the gene symbol for the Ikaros protein). There are other Ikaros family members that have high homology and that are reported to bind similar DNA sequences (for instance Aiolos and Helios), which are expressed in B-cells and T-cells. A discussion of this is of relevance, as these are different proteins, although belonging to the same family (the Ikaros-family) of transcription factors. For instance, western for Aiolos and Helios will likely detect Aiolos in the B cells used, and Helios in the T cells used.

      We agree with the reviewer. As requested, we now discuss the possibility that Aiolos or Helios may also contribute.

      Reviewer #2 (Public Review):

      The study is performed with old tool Spamo (12 year ago), source data from Encode (2010-2012), even peak caller tool version MACS is old ~ 2013. De novo motif search tool is old too (new one STREME is not mentioned). Any composite element search tool published for the recent 12 years are not cited, there are some issues in data analysis in presentation. Almost all references are from about 8-10 year ago (the most recent date is 2019)

      The title is misleading

      Instead of “A new pipeline SPICE identifies novel JUN-IKZF1 composite elements”

      It should be written as “Application of SpaMo tool identifies novel JUN-IKZF1 composite elements”

      It reflects the pipeline better but honestly shows that the novelty is missed.

      Regarding the above two points, we respectfully disagree with the reviewer. Although SpaMo was used, the pipeline we developed is new and our findings are distinctive. The pipeline can systematically screen and predict novel protein-protein binding complex, and our discovery related to IKZF1-JUN composite element is new and the biological findings and validation are distinctive. This point is now made in the revised manuscript. As requested, we have added some additional references.

      The study was performed on too old data from ENCODE, authors mentioned 343 Encode ChIP-Seq libraries, but authors even did not care even about to set for each library the name of target TF (Figure 1E, Figure S2, Table 2).

      Although we used ENCODE data, which was in part when we initially developed the algorithm, those data are valid and using them allowed us to demonstrate the functionality of SPICE, which is versatile and can be used on datasets of one’s choice as well. As requested, in the revised manuscript we have added the names of the TFs in Figs, Fig. S2, and Table 1.

      Reviewer #3 (Public Review):

      The authors of this study have designed a novel screening pipeline to detect DNA motif spacing preferences between TF partners using publicly available data. They were able to recapitulate previously known composite elements, such as the AP-1/IRF4 composite elements (AICE) and predict many composite elements that are expected to be very useful to the community of researchers interested in dissecting the regulatory logic of mammalian enhancers and promoters. The authors then focus on a novel, SPICE predicted interaction between JUN and IKZF1, and show that under LPS and IL-21 treatment, JUN and IKZF1 in B cells have significant overlap in their genomic localization. Next, to know whether the two TFs physically interact, a co-immunoprecipitation experiment was performed. While JUN immunoprecipitated with an anti-IKZF1 antibody, curiously IKZF1 did not immunoprecipitate with an anti-JUN antibody. Finally, EMSA and luciferase experiments were performed to show that the two TFs bind cooperatively at an IL20 upstream probe.

      The reviewer has described the basic results of the study.

      Major strengths:

      1) SPICE was able to recapitulate previously known composite elements, such as the AP-1/IRF4 composite elements (AICE).

      2) Under LPS and IL-21 treatment, JUN and IKZF1 in B cells have significant overlap in their genomic localization. This is very good supporting evidence for the efficacy of SPICE in detecting TF partners.

      We are glad that the reviewer believes that SPICE is effective in detecting TF partners.

      Major weaknesses:

      1) The authors fail to convincingly show that IKZF1 and Jun physically interact. A quantitative measurement of their interaction strength would have been ideal.

      We agree that it is not conclusive that the factors interact directly as opposed to binding to nearby sites on DNA, which is what SPICE was intended to detect. We never intended to claim that we established a definite physical interaction. The coIP worked in one direction, but not reliably in the other, even though we have tried a total of four different antibodies. We now mention in the revised manuscript that we have tried the additional anti-JUN antibodies, cJun (60A8, CST) and JunD (D17G2, CST).

      2) The super-shift experiment to show that the proteins bound to their EMSA probe were indeed IKZF1 and JUN are not very convincing and would benefit from efforts to quantify the shift (Figure 3E). Nuclear extracts from cells with single or double CRISPR knock outs of the two TFs would have been ideal.

      We agree that using single or double knockouts would be helpful, but other Ikaros family or Jun family members could be involved, so such studies might not be definitive. That is why we used purified proteins to show apparent cooperative binding (Figure 4C).

      3) There is a second band beneath the more prominent band in the EMSA experiment with recombinant IKZF1 and JUN (Figure 4C). This second band is most probably bound by IKZF1 because it becomes weaker when the IKZF1 site is mutated and is completely absent when only JUN is added. This is completely ignored by the authors. Therefore, experiments with EMSA fail to convincingly show that IKZF1 and Jun bind cooperatively. They could just as well bind independently to the two sites.

      The second band has a faster mobility and might relate to IKZF1, although this is difficult to know. We comment on this band on revised manuscript. As noted above, the purified protein experiments do suggest cooperativity. However, our overall intent was to identify factors binding in proximity, which SPICE has successfully done, even if the binding was “independent”.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors explored the benefits of intermittent fasting on the cardiac physiology through a multi-omics approach and compared different fasting times (IF12; IF 16 and EOD) for a duration of 6 months. Combining the RNA-sequencing, proteomics and phosphor-proteomics analysis, the authors have made an interesting observation that different fasting times would lead to different changes that could be important for the cardiac physiology. Moreover, the changes observed at transcriptional level are different from protein level, suggesting a post-transcriptional regulation mechanism. Using western blot, the authors have confirmed the key signaling pathways, including AMPK, IRS pathway to be significantly altered upon intermittent fasting for 16hrs. Lastly, as a proof of concept for better cardiac function, the animals were challenged with dobutamine and echocardiography was performed to show the mice subjected to intermittent fasting have better cardiac systolic function.

      The impact of intermittent fasting on cardiovascular health has been well characterized in several studies. This report appears to be the first one utilizing a multi-omics approach and provided an interesting dataset at transcriptome, proteome and phosphor-proteome levels, and would serve as a valuable data resource for the field. I have the following concerns:

      Major concerns:

      1) The rationale for choosing the intermittent fasting pattern and timing While the 16:8 intermittent fasting is relatively standard, what is the rationale to test IF 12 hours? As a 4-hour fasting difference might not cause dramatic changes in transcriptome and proteome. Also, what is the rationale to perform 6 months study? The dobutamine stress test is not a terminal procedure, have the authors examined the cardiac function prior to 6 months to see whether there is a difference?

      We sincerely thank the reviewer for providing insightful comments and feedback on our study. The aim of our research is to gain a comprehensive understanding of molecular reprogramming in the heart during intermittent fasting using multi-omics techniques. We acknowledge the reviewer's concern regarding the selection of three different time points for intermittent fasting. Our rationale for choosing these time points was to align with the practices commonly used by researchers in the field. By doing so, we intended to explore and compare the effects of different intermittent fasting regimens on the heart. Through our study, we found that a longer fasting period resulted in the most significant changes in the proteome abundance. Though we agree that 4-hour fasting difference may not significantly alter transcriptome and proteome in terms of expressions, remarkable changes of post-translational modifications such as phosphorylation can occur during shorter time periods and this is evident based on the analyses of the modulated phosphoproteins. Hence, we included 12 hours time point also to our analysis. In fact, we would like to emphasize that all three fasting regimens had notable effects on pathways regulating cellular carbohydrates, lipid and protein metabolism, cell-cell interactions, and myocardial cell contractility. Regarding the duration of our study, we opted for a 6-month duration of intermittent fasting to investigate the impact of chronic intermittent fasting on heart transcriptome and proteome changes. While shorter-term (2-3 months) intermittent fasting studies in animals also have shown beneficial effects, we wanted to delve deeper into the molecular alterations induced by long-term intermittent fasting. We acknowledge the reviewer's observation about the dobutamine stress test not being a terminal procedure. In our manuscript, we aimed to present extensive resource data offering molecular insights into intermittent fasting-induced structural and signaling changes in the heart, focusing on various intermittent fasting time intervals. Additionally, we included the effect of cardiac function in response to intermittent fasting, specifically examining the intermittent fasting 16 hours (IF16) group, and highlighted key pathway modulations at this time point as supporting evidence. We appreciate the reviewer’s concern about examining cardiac function prior to 6-month. Although we did not perform this analysis in the current study, we fully agree that such comparison is required for a better understating of the temporal effects of molecular pathways in relation to heart functions during the course of intermittent fasting.

      2) Lack of validation study. One interesting observation from this study is the changes of transcriptome does not reflect all the changes at protein level and there is a differential gene expression pattern in IF12, IF16 and EOD. If this is the case, the authors should select a few important targets and provide both mRNA and protein level analysis, as a proof of concept for the bioinformatics analysis accuracy.

      We appreciate the reviewer's attention to the comparison of proteome and transcriptome data across different intermittent fasting regimens, as well as their interest in understanding any specific deviations in dietary regimens or sets of proteins. Indeed, it is well-established that post-transcriptional regulation can lead to discrepancies between mRNA and protein levels, primarily due to translational control or protein degradation mechanisms. Posttranscriptional buffering of proteins, particularly enzymes and kinases, is a plausible explanation, given their regulation through post-translational modifications, such as phosphorylations or allosteric regulations. Despite observing a modest correlation between the proteome and transcriptome data, which is generally common, we did identify certain enzymes, such as HMGC2, PDK4ACOT, CLPX, and RNASE4, with a high level of concordance between protein and mRNA abundances. These instances of agreement between the two data types suggest a coordinated regulation of these enzymes at the transcriptional and translational levels during intermittent fasting. To facilitate a clearer understanding of the correlation between proteome and transcriptome data, we have included correlation levels next to the scatter plots in our manuscript. These annotations aim to provide additional insights and aid readers in assessing the relationship between the two datasets.

      3) Poor western blot image quality. The quality of the western blot has several issues: a. the change of pAMPK/AMPK appears to be a decrease of total AMPK instead of change at p-AMPK level. Same with GSK3a/b. There appears to be an increase of total GSK3a/b. The AKT should also be blotted and quantified at phosphorylation level. The western blot should be clearly labeled, for the ones with double bands, including GSK3a/b, the author should clearly label which is GSK3a and which is GSK3b. For the IRS with non-specific band, the author should point out IRS-1 band itself.

      We appreciate the reviewer's careful evaluation of our study and acknowledge the concerns raised regarding the quality of the western blot images. Despite revising these experiments multiple times, we acknowledge that the immunoblot images may not meet the highest quality standards. We have included the original immunoblots in the supplementary section to ensure transparency and provide additional data for reference.

      Reviewer #2 (Public Review):

      This study provides an unbiased characterization of the cardiac proteome in the setting of intermittent fasting. The findings constitute a resource of quantitative proteomic data that sheds light on changes in cardiac function due to diet and that may be used in the future by other investigators. There are a number of key missing details that limit interpretation or present opportunities to strengthen the study.

      1) For example, the authors find that apolipoproteins are altered with fasting but it is not clear whether this is a contribution of myocardial tissue changes or systemic effects spilling into blood in cardiac tissues.

      We appreciate the reviewer's consideration of the potential effect of spilling blood on our study results. While we agree that such an effect is possible, we would like to emphasize that the observed overall changes in the proteome profile, particularly in pathways regulating metabolism and other cardiac remodeling-associated processes, suggest that the alterations we observed are more likely attributed to changes within the myocardial tissues themselves. We would like to highlight that blood microparticles or extracellular proteins were not enriched in our proteome data and hence the impact of blood spilling is not a concern. In fact, the biological processes we observed were majorly associated with ECM receptor interaction, focal adhesion and signaling pathways, which are not typical for secreted or extracellular proteome encompassing blood leakage.

      2) Some statements in the text like "Approximately one-third of the differentially expressed proteins in IF groups compared to AL were enzymes with catalytic activity involved in energy homeostasis pathways" do not appear to be supported by data.

      The enzymes among all the differentially expressed proteins in the intermittent fasting (IF) groups compared to the ad libitum (AL) control group are indicated in Supplementary Table S2. This constitutes one-third of the total number of differentially expressed proteins and several of these are involved in metabolic and energy homeostasis pathways.

      3) It is not clear how the list of Kinases were generated for Figure 1B.

      For the kinases indicated in Figure 1B, all the kinases from the proteins that were differentially expressed among the different dietary regimens compared to the control ad libitum (AL) group were first identified (listed in Supplementary Table S2), followed by enrichment analysis ((FDR ≤ 0.05) of the identified kinases across different pathways identified from KEGG pathways derived from DAVID bioinformatics resources.

      4) Changes in chromatin or gene expression are not measured so the conclusion that EOD led to 'epigenetic changes' relative to IF16 is not well supported.

      We appreciate the reviewer's feedback. Our statement in the manuscript referred specifically to the changes observed in Figure 2, where we presented increased proteomic abundance in pathways related to chromatin remodeling, chromatin organization, gene expression regulation, and histone modification in the EOD (Every Other Day Fasting) group compared to the IF 16 (Intermittent Fasting for 16 hours) group based on functional process and pathway enrichment analysis. Our comprehensive bioinformatics analysis, depicted in Figure 2, provides intriguing insights into these pathways. We acknowledge that further validation and in-depth studies through additional experiments and functional assays are essential to strengthen the conclusion from such observations, which is beyond the scope of the current study. We thank the reviewer for such valuable suggestions that are very useful for our ongoing studies, where we aim to obtain a more robust and thorough understanding of the impact of intermittent fasting on chromatin-related processes.

      5) There are also a number of areas where the text is vague. For example, it is not clear what is meant by 'trend shift' when discussing EOD results and Figure 3 generally could use additional information to better understands the figures.

      We would like to clarify that the term 'trend shift' refers to the change in the direction of protein and transcript level alterations. Based on the 2-D enrichment analyses that revealed correlated and non-correlated functional processes at the proteome and transcriptome levels, it was evident that during the early intermittent fasting 12 hours (IF12) regimen, the abundance changes of the proteins and transcripts involved in these processes were altered in the same direction (Supplementary Fig. 4b). Nevertheless, with increased fasting hours, mainly in the Every Other Day Fasting (EOD) group, we observed that the levels of proteins and transcripts involved in several of the functional processes appeared to be non-correlated as compared to the IF12 group (Fig. 2d). In Figure 3, we summarize the overall altered protein networks associated with the different intermittent fasting regimens, highlighting densely connected clusters of proteins along with their associated biological processes and pathways. Additionally, we unravel the impact of intermittent fasting on transcriptional rewiring and highlight regimen-specific alterations of specific transcriptional factors, several of which were found to have metabolic implications.

      6) An interesting finding is that the IF16 groups showed cardiac hypertrophy (SFig 11b). This is potentially a novel finding and the text should elaborate more on this phenomenon.

      We sincerely thank the reviewer for bringing attention to this intriguing aspect of our study. The data you have highlighted warrants further investigation, and we are committed to delving deeper into this area in our future research.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript focused on roles of a key fatty-acid synthesis enzyme, acetyl-coA-carboxylase 1 (ACC1), in the metabolism, gene regulation and homeostasis of invariant natural killer T (NKT_ cells and impact on these T cells' roles during asthma pathogenesis. The authors presented data showing that the acetyl-coA-carboxylase 1 enzyme regulates the expression of PPARg then the function of NKT cells including the secretion of Th2-type cytokines to impact on asthma pathogenesis. The results are clearcut and data were logically presented.

      Major concerns:

      1) This study heavily relied on the CD4-CreACC1fl/fl mice. While using of a-GalCer stimulation and Ja18KO mice mitigated the concern, it is still a major concern that at least some of the phenotype were due to the effect on conventional CD4 T cells. For example, the deletion of ACC1 gene seems also decreased the numbers of conventional CD4 T cells (Fig. 2D, Fig. S1D). Previously there were reports showing ACC1 gene in conventional CD4 T cells also plays a role in lung inflammation (Nakajima et al., J. Exp. Med. 218, 2021). If the authors believe the phenotype observed was mainly due to iNKT cells, rather than conventional CD4 T cells, a compare/contrast of the two studies should be discussed to explain or reconcile the results.

      As the reviewer pointed out, although we have experimentally demonstrated the critical role of ACC1 in iNKT cells in the regulation of allergic asthma, use of Cd4-CreAcc1fl/fl mice inevitably brings the role of conventional CD4+ T-cells in question.

      The study conducted by Nakajima et al, which reported that the absence of ACC1 in CD4+ T-cells resulted in reduced numbers and functional impairment of memory CD4+ T-cells, leading to less airway inflammation further suggests possibility of involvement of conventional CD4+ T-cells in regulation of allergic asthma. The direct compare/contrast of two studies seems difficult since Nakajima et al have focused on the role of ACC1 in memory CD4+ T cells while we have focused on iNKT cells.

      However, based on our experimental results, we believe that iNKT cells more contribute to the regulation of allergic asthma for the following reasons - (i) while the number of iNKT cells were significantly reduced in Cd4-CreAcc1fl/fl mice, the number of conventional CD4+ T cells were only slightly reduced, (ii) Cd4-CreAcc1fl/fl mice were dramatically decreased in their AHR in α-GalCer induced iNKT cell dependent allergic asthma model, and (iii) Jα18 KO mice that lack iNKT cells almost completely restore their AHR when adoptively transferred with WT iNKT cells but not ACC1-deficient iNKT cells. These results indicate that ACC1-mediated regulation of AHR is significantly dependent on iNKT cells, which might contribute to AHR in the study conducted by Nakajima et al. as well. From these, we believe that while ACC1 is a critical regulator of both conventional CD4+ T cells and iNKT cells in regulation of allergic asthma, iNKT cells may contribute more to regulation of allergic asthma compared to CD4+ T cells. We have summarized the above-mentioned contents in LINES: 421-441 with the reference you have mentioned:

      "It should be noted that Cd4-CreAcc1fl/fl mice lack ACC1 expression in both conventional CD4+ T cells and iNKT cells. It should be noted that Cd4-CreAcc1fl/fl mice lack ACC1 expression in both conventional CD4+ T cells and iNKT cells. While the use of iNKT cell- specific Cre system would demonstrate critical role of ACC1 in iNKT cells regarding allergic asthma, there is no iNKT cell-specific Cre system available yet. In addition, the study conducted by Nakajima et al, which reported that the absence of ACC1 in CD4+ T cells resulted in reduced numbers and functional impairment of memory CD4+ T cells, leading to less airway inflammation further suggests possibility of involvement of conventional CD4+ T cells in regulation of allergic asthma. However, based on our experimental results, we believe that iNKT cells more contribute to the regulation of allergic asthma for the following reasons - (i) while the number of iNKT cells were significantly reduced in Cd4-CreAcc1fl/fl mice, the number of conventional CD4+ T cells were only slightly reduced, (ii) Cd4-CreAcc1fl/fl mice were dramatically decreased in their AHR in α-GalCer induced allergic asthma model, and (iii) Jα18 KO mice that lack iNKT cells almost completely restore their AHR when adoptively transferred with WT iNKT cells but not ACC1-deficient iNKT cells. These results indicate that ACC1-mediated regulation of AHR is significantly dependent on iNKT cells, which might contribute to AHR in the study conducted by Nakajima et al. as well. From these, we believe that while ACC1 is a critical regulator of both conventional CD4+ T cells and iNKT cells in regulation of allergic asthma, iNKT cells may contribute more to regulation of allergic asthma compared to CD4+ T cells."

      2) The overall significance of the manuscript is related to the potential clinical suppression of ACC1 in human asthma patients. However, the authors only showed the elevated ACC1 genes in these patients, not even in vitro data demonstrating that suppression of ACC1 genes in the iNKT cells from patients could have potential therapeutic effect or suppression of the relevant cytokines.

      We would like to appreciate reviewer’s critical comment here. Due to paucity of iNKT cells in human PBMCs, it is extremely difficult to experimentally manipulate expression level of ACC1 in human iNKT cells. Alternatively, to address reviewer’s comment, we compared the cytokine expression of ACC1high iNKT cells from human allergic asthma patients to ACC1low iNKT cells from healthy individuals or non-allergic asthma patients. Our results show that iNKT cells from allergic asthma patients express higher levels of IL4 and IL13 than those from healthy individuals or non-allergic asthma patients, suggesting that the level of ACC1 is most likely involved in functionality of human iNKT cells as well. The results are newly shown in supplementary Fig. 5C with explanation in LINES 376-378 and 382-384:

      LINES 376-378: Lastly, the expression levels of IL4 and IL13 were significantly higher in iNKT cells from the allergic asthma patients compared to those from healthy controls and nonallergic asthma patients (Fig. S5C).

      LINES 382-384: Thus, iNKT cells from allergic asthma patients express higher ACC1, FASN and PPARG levels and lower levels of a glycolysis which is accompanied with higher levels of IL4 and IL13 than iNKT cells from healthy controls and nonallergic asthma patients.

      3) The authors report that a-GalCer administration can induce the AHR, however, in the cited paper (Hachem et al., Eur J. Immunol. 35, 2793, 2005), iNKT cell activation seems to have the opposite effect to inhibit AHR. Did the authors mean to cite different papers?

      We apologize for the confusion. We have replaced the inaccurate reference with the reference below in LINES 863-865:

      1. Glycolipid activation of invariant T cell receptor+ iNKT cells is sufficient to induce airway hyperreactivity independent of conventional CD4+ T cells, Proc Natl Acad Sci USA, 103 pp, 2782-2787 (2006),

      Reviewer #2 (Public Review):

      In this study the authors sought to investigate how the metabolic state of iNKT cells impacts their potential pathological role in allergic asthma. The authors used two mouse models, OVA and HDM-induced asthma, and assessed genes in glycolysis, TCA, B-oxidation and FAS. They found that acetyl-coA-carboxylase 1 (ACC1) was highly expressed by lung iNKT cells and that ACC1 deficient mice failed to develop OVA-induced and HDM-induced asthma. Importantly, when they performed bone marrow chimera studies, when mice that lacked iNKT cells were given ACC1 deficient iNKT cells, the mice did not develop asthma, in contrast to mice given wildtype NKT cells. In addition, these observed effects were specific to NKT cells, not classic CD4 T cells. Mechanistically, iNKT cell that lack AAC1 had decreased expression of fatty acid-binding proteins (FABPs) and peroxisome proliferator-activated receptor (PPAR)γ, but increased glycolytic capacity and increased cell death. Moreover, the authors were able to reverse the phenotype with the addition of a PPARg agonist. When the authors examined iNKT cells in patient samples, they observed higher levels of ACC1 and PPARG levels, compared to healthy donors and non-allergic-asthma patients.

      We are very grateful for your kind appreciation of our work.

      Reviewer #1 (Recommendations For The Authors):

      1) Related to major concern I, an iNKT cell-specific knockout of ACC1 in iNKT cells is highly desirable and should be used to directly address the question.

      As the reviewer suggested, iNKT cell-specific deletion of ACC1 will provide invaluable information to our study. Unfortunately, Cre-Loxp system that specifically targets iNKT cells has not be developed. Thus, we opted to use CD4-Cre system, which is the gold standard Cre system for the study of iNKT cells. In addition, to highlight the role of ACC1 in iNKT cells in relation to regulation of allergic asthma, we performed iNKT cell-dependent experiment models and conducted adoptive transfer of iNKT cells into iNKT cell-deficient mice (Jα18 KO). These have been discussed in the section of Discussion in LINES:421-441:

      "It should be noted that Cd4-CreAcc1fl/fl mice lack ACC1 expression in both conventional CD4+ T cells and iNKT cells. While the use of iNKT cell- specific Cre system would demonstrate critical role of ACC1 in iNKT cells regarding allergic asthma, there is no iNKT cell-specific Cre system available yet. In addition, the study conducted by Nakajima et al, which reported that the absence of ACC1 in CD4+ T cells resulted in reduced numbers and functional impairment of memory CD4+ T cells, leading to less airway inflammation further suggests possibility of involvement of conventional CD4+ T cells in regulation of allergic asthma. However, based on our experimental results, we believe that iNKT cells more contribute to the regulation of allergic asthma for the following reasons - (i) while the number of iNKT cells were significantly reduced in Cd4-CreAcc1fl/fl mice, the number of conventional CD4+ T cells were only slightly reduced, (ii) Cd4-CreAcc1fl/fl mice were dramatically decreased in their AHR in α-GalCer induced allergic asthma model, and (iii) Jα18 KO mice that lack iNKT cells almost completely restore their AHR when adoptively transferred with WT iNKT cells but not ACC1-deficient iNKT cells. These results indicate that ACC1-mediated regulation of AHR is significantly dependent on iNKT cells, which might contribute to AHR in the study conducted by Nakajima et al. as well. From these, we believe that while ACC1 is a critical regulator of both conventional CD4+ T cells and iNKT cells in regulation of allergic asthma, iNKT cells may contribute more to regulation of allergic asthma compared to CD4+ T cells."

      2) For Fig. 5A, RT-PCR verification of PPARg gene expression level change is needed.

      As suggested, we have verified the level of Pparg expression of ACC1-deficient iNKT cells through real time PCR and have added the results to Figure 5A.

      3) Verifying at least the cytokine secretion can be regulated by manipulating ACC1 expression in human asthma patient samples will make the paper much stronger.

      We would like to appreciate reviewer’s critical comment here. Due to paucity of iNKT cells in human PBMCs, it is extremely difficult to experimentally manipulate expression level of ACC1 in human iNKT cells. Alternatively, to address reviewer’s comment, we compared the cytokine expression of ACC1high iNKT cells from human allergic asthma patients to ACC1low iNKT cells from healthy individuals or non-allergic asthma patients. Our results show that iNKT cells from allergic asthma patients express higher levels of IL4 and IL13 than those from healthy individuals or non-allergic asthma patients, suggesting that the level of ACC1 is most likely involved in functionality of human iNKT cells as well. The results are newly shown in supplementary Fig. 5C with explanation in LINES 376-378 and 382-384:

      LINES 376-378: Lastly, the expression levels of IL4 and IL13 were significantly higher in iNKT cells from the allergic asthma patients compared to those from healthy controls and nonallergic asthma patients (Fig. S5C).

      Minor points:

      1) What are the cells being stained in Fig. S2C? Are they iNKT cells? If yes, why there is a tetramer-negative population?

      The density plot on the left panel of Fig. S2C represents magnetically enriched thymic iNKT cells. Due to their scarcity, thymic iNKT cells were enriched using CD1d tetramer via magnetic activated cell sorting (MACS)-based enrichment technique. After enrichment, we re-stained enriched cells with CD1d tetramers and gated out CD3 and CD1d tetramer double positive cells via flow cytometry to specifically identify iNKT cells. Due to the imperfect purity of magnetic cell separation technique, a small proportion of CD1d tetramer-negative population is seen in the left panel of Fig. S2C.

      A brief mention of this methodology has been added to the “Preparation and activation of murine T and iNKT cells” section under Materials and Methods in LINES 560-566:

      "Alternatively, thymic and liver mononuclear cells were labeled with APC-conjugated ɑ-GalCer/CD1d tetramers, bound to anti-APC magnetic beads, and enriched on a MACS separator (Miltenyi Biotec, Auburn, CA, USA; purity 89%). To analyze the development of thymic iNKTs cells, we re-stained enriched cells with CD1d tetramer and gated out CD3 and CD1d tetramer double positive cells via flow cytometry to identify thymic iNKT cells, which were used for further analysis."

      2) Where are the adoptive transferred iNKT cells purified/sorted from? Are they from lungs of Acc1fl/fl or CD4-cre/Acc1fl/fl mice, asthma-induced already? As there are very few iNKT cells in healthy and untreated mice. There is little described or explained in Methods and Materials.

      The adoptively transferred iNKT cells were purified and pooled from the lungs of at least 10 mice per group. Briefly, mouse lungs were finely chopped into small pieces using razor blades and enzymatically digested using type IV collagenase. iNKT cells from the lungs were sorted via FACS using CD1d tetramers. Approximately, 6.0 × 105 of iNKT cells were obtained from the lungs at least of 10 mice. A brief mention of this methodology was added to the “Adoptive transfer of iNKT cells in allergic asthma models” section in Materials and Methods in LINES 568-574: iNKT cells were obtained from the lungs of at least 10 Acc1fl/fl or Cd4-CreAcc1fl/fl mice. Mouse lungs were finely chopped into small pieces using razor blades and were enzymatically digested using type IV collagenase. iNKT cells from the lungs were sorted via FACS using CD1d tetramers. Approximately, 6.0 × 105 of iNKT cells were obtained from at least 10 mice and were adoptively transferred into individual recipient mouse via the intratracheal route.

      3) The use of 2-NBDG was not explained in multiple locations, particularly in Fig.5H. How is its fluorescence used to track iNKT cells? No description in Materials and methods.

      2-NBDG, a fluorescence tagged glucose analog is a indicator for measurement of glucose uptake in cells. The fluorescence intensity in 2-NBDG-treated cells represents the degree of glucose uptake in cells, which can be measured using flow cytometry. Thus, in the experiments where we treated 2-NBDG, we described the results as "glucose uptake". A brief explanation of this methodology was added to the main text in LINES 253-254. In addition, we have provided the detailed use of 2-NBDG in ‘Measurement of glucose uptake capacity’ under the section of Materials and methods in LINES 599-607: Measurement of glucose uptake capacity using 2-NBDG assay. After treating 2-NBDG, the fluorescence intensity of cells were measured using flow cytometry and represented the degree of glucose uptake in cells.

      4) Fig. 3A legends: it should be "Ja18 KO"?

      We would like to appreciate your comment on our mistake here. We have corrected this in the legend of figure 3A.

      5) There are two different mechanisms for explaining the less severe asthma/AHR phenotype in ACC1-KO iNKT cells. One is lower number of iNKT cells due to cell death, the other decreased cytokine secretions. It is not clear to the reviewer, what are the relationship between two mechanisms. Are they both contributing to the asthma phenotype or cooperative?

      As you mentioned, ACC1-deficient iNKT cells showed increase in intrinsic pathway of apoptosis as well as decrease in their cytokine secretion simultaneously. Thus, we believe that increase in cell death and decrease in cytokine expression of ACC1-deficient iNKT cells cooperatively contributed to the asthma phenotype. The above-mentioned point was discussed in LINES 453-458: Furthermore, the apoptotic tendency of the ACC1-deficient iNKT cells was accompanied by their functional impairment. The ACC1-deficient iNKT cells exhibited impaired viability and functionality. Treatment of glycolysis inhibitor in ACC1-deficient iNKT cells not only restored cellular survival but also their functionalities. From these results, we speculate that ACC1-mediated regulation of both cellular homeostasis and cytokine production cooperatively contributed to the asthma phenotype.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this is a very strong study with few concerns.

      1) Are there tissue specific differences in the iNKT cell populations? The authors examined lung iNKT cells in the Figs 1-3, and used liver NKT cells for the mechanistic studies in Fig 4-5. The studies shown in Fig S2 suggest that ACC1 deficient iNKT cells have developmental defects and impaired homeostatic proliferative capacity. Does ACC1 impact lung and liver iNKT cells similarly and is the lack of allergic asthma in ACC1 deficient iNKT cells due to defective iNKT cell trafficking to the lungs or a failure to survive after transfer (Fig 3)?

      In absence of ACC1, the number of iNKT cells from both lungs and livers decreased and showed consistent features (i.e: metabolic parameters), suggesting that there was no tissue specific role of ACC1 in INKT cells.

      In the adoptive transfer experiments, we transferred equal number of WT and ACC1-deficient iNKT cells directly into mouse lungs via intratracheal route. Thus, decreased numbers of adoptively transferred ACC1-deficient iNKT cells is more likely from their intrinsically impaired homeostatic proliferative capacity, not due to defective trafficking to the lungs.

      2) Similarly, are chemokine receptor expression patterns similar between WT and ACC1 deficient iNKTs (Fig 4)?

      We compared chemokine receptor expression of WT and ACC1-deficient iNKT cells using our RNA-seq and verified their expression levels via real time q-PCR. The expression levels of these chemokine receptors were comparable between the two groups of iNKT cells. The results are newly shown in supplementary Fig. 4I with explanation in LINES 351-357:

      Meanwhile, chemokine receptor signaling is also implicated in regulating homeostasis of iNKT cell in the periphery. In particular, Meyer et al. suggested that iNKT cells require CCR4 to localize to the airways and to induce AHR. Thus, we examined the expression of several chemokine receptors, including CCR4. We found that WT and ACC1-deficient iNKT cells did not differ in their chemokine receptor expressions, suggesting that the chemokine signaling may not be critical for ACC1-mediated regulation in AHR.

      3) The authors data suggest that Tregs are not playing a major role in the regulation of asthma induction in their ACC1 deficient mice, based on FoxP3 expression. Did the authors perform suppressor assays to show that the Tregs function similarly in WT and ACC1 deficient mice?

      We would like to appreciate reviewer’s reasonable comment. However, we did not experimentally compare the suppressive capacity of WT and ACC1-deficient Tregs under the asthmatic conditions, due to minimal differences in their Foxp3 expression (Foxp3 expression is a critical determinant of suppressive function of Tregs- (Immunity. 2019 Feb 19;50(2):302-316.; Nat Immunol 2003; 4: 330–336; Cell Mol Immunol. 2015 Sep;12(5):558-65.)). Thus, we speculate that the suppressive capacity between WT and ACC1-deficient Tregs might be similar. Nevertheless, since the suppressive capacity of Tregs can also be regulated by other soluble factors and surface molecules, we cannot completely rule out the possibility that ACC1-deficient Tregs might differ in their suppressive capacity to WT Tregs in asthma. In short, while there are clear limitations to our interpretation here, we believe it is unlikely that Tregs from WT and ACC1 deficient mice show difference in their suppressive capacity during asthma. We have included above-mentioned points in the section of Discussion in LINES 415-419: In this regard, Tregs may also play a major role in asthma. However, the expression level of Foxp3 was comparable between WT and ACC1-deficient Tregs. The level of Foxp3 to some extent, serves as a critical determinant of suppressive function of Tregs. Thus, we speculate that they might not critically contribute to the development of asthma, although we cannot completely rule out the contribution of Tregs to our studies.

    1. Author Response

      We would like to thank the reviewers for their positive and constructive comments on the manuscript.

      We are planning the following revisions to both DGRPool and the corresponding manuscript to address the reviewers’ comments:

      1) We agree with reviewer #1 that normalizing the data could potentially improve the GWAS results. Thus, we plan to explore the implementation of this option and assess its impact on the overall results. We will also investigate replacing the ANOVA test with a KRUSKAL test. Instead of upfront data normalization, we will consider using the PLINK –pheno-quantile-normalize option. Both options will be compared on a set of phenotypes where we can analyze the output (i.e., for phenotypes where we expect to find specific variants), to determine whether these strategies enhance the detection power.

      2) We also agree with both reviewers that gene expression information is of interest. However, we recognize that incorporating such information would entail substantial work (as elaborated in our response to comments below). We feel that this extensive work is beyond the current scope of this paper, which primarily focuses on phenotypes and genotype-phenotype associations. Nonetheless, we are committed to enhancing user experience by including more gene-level outlinks to Flybase. Additionally, we will link variants and gene results to Flybase's online genome browser, JBrowse. By following the reviewers' suggestions, we aim to guide DGRPool users to potentially informative genes.

      3) In agreement with reviewer #2, we acknowledge that additional tools could enhance DGRPool's functionality and facilitate meta-analyses for users. Therefore, we are in the process of developing a gene-centric tool that will allow users to query the database based on gene names. Moreover, we intend to integrate ortholog databases into the GWAS results. This feature will enable users to extend Drosophila gene associations to other species if necessary.

      4) Finally, we also concur with both reviewers about making minor edits to the manuscript to address their feedback.

      Reviewer #1 (Public Review):

      This is a technically sound paper focused on a useful resource around the DRGP phenotypes which the authors have curated, pooled, and provided a user-friendly website. This is aimed to be a crowd-sourced resource for this in the future.

      The authors should make sure they coordinate as well as possible with the NC datasets and community and broader fly community. It looks reasonable to me but I am not from that community.

      We thank the reviewer for the positive comments. We are relatively well-connected to the D. melanogaster community and aim to leverage this connection to render the resource as valuable as possible. DGRPool in fact already reflects the input of many potential users and was also inspired by key tools on the DGRP2 website. Furthermore, it also rationalizes why we are often bridging our results with other resources, such as linking out to Flybase, which is the main resource for the Drosophila community at large.

      I have only one major concern which in a more traditional review setting I would be flagging to the editor to insist the authors did on resubmission. I also have some scene setting and coordination suggestions and some minor textual / analysis considerations.

      The major concern is that the authors do not comment on the distribution of the phenotypes; it is assumed it is a continuous metric and well-behaved - broad gaussian. This is likely to be more true of means and medians per line than individual measurements, but not guaranteed, and there could easily be categorical data in the future. The application of ANOVA tests (of the "covariates") is for example fragile for this.

      The simplest recommendation is in the interface to ensure there is an inverse normalisation (rank and then project on a gaussian) function, and also to comment on this for the existing phenotypes in the analysis (presumably the authors are happy). An alternative is to offer a kruskal test (almost the same thing) on covariates, but note PLINK will also work most robustly on a normalised dataset.

      We thank the reviewer for raising this interesting point. Indeed, we did not comment on the distribution of individual phenotypes due to the underlying variability from one phenotype to another, as suggested by the reviewer. Some distributions appear normal, while others are clearly not normally distributed. This information is 'visible' to users by clicking on any phenotype; DGRPool automatically displays its global distribution if the values are continuous/quantitative. We acknowledge the reviewer's concerns regarding the use of ANOVA tests. However, we consider it acceptable to perform linear regression (including ANOVA tests) on non-normally distributed data, as only the prediction errors need to follow a normal distribution.

      Furthermore, the ANOVA test is solely conducted to assess whether any of the potential covariates (such as well-established inversions and symbiont infection status) are associated with the phenotype of interest. PLINK2 automatically corrects for the effects of these covariates during GWAS by considering them as part of the regression model.

      Nevertheless, we concur with the reviewer that normalizing the data could potentially enhance GWAS results. Consequently, we commit to exploring the impact of data normalization on the overall outcomes. Additionally, we will consider replacing the ANOVA test with a KRUSKAL test, and using the PLINK –pheno-quantile-normalize option. We intend to compare both approaches using a set of phenotypes where we can compare the output (i.e., where specific variants are expected to be identified). This comparison will help us determine if either method enhances the detection power.

      Minor points:

      On the introduction, I think the authors would find the extensive set of human GWAS/PheWAS resources useful; widespread examples include the GWAS Catalog, Open Targets PheWAS, MR-base, and the FinnGen portal. The GWAS Catalog also has summary statistics submission guidelines, and I think where possible meta-data harmonisation should be similar (not a big thing). Of course, DRGP has a very different structure (line and individuals) and of course, raw data can be freely shown, so this is not a one-to-one mapping.

      Thank you for the suggestion. We will cite these resources in the Introduction and check the GWAS catalog submission guidelines to compare to the ones we are proposing in this paper.

      For some authors coming from a human genetics background, they will be interpreting correlations of phenotypes more in the genetic variant space (eg LD score regression), rather than a more straightforward correlation between DRGP lines of different individuals. I would encourage explaining this difference somewhere.

      We appreciate this potential issue and we will make this distinction clearer in the manuscript to avoid any confusion.

      This leads to an interesting point that the inbred nature of the DRGP allows for both traditional genetic approaches and leveraging the inbred replication; there is something about looking at phenotype correlations through both these lenses, but this is for another paper I suspect that this harmonised pool of data can help.

      We agree with the reviewer and hope that more meta-analyses will be made possible by leveraging the harmonized data that are made available through DGRPool.

      I was surprised the authors did not crunch the number of transcript/gene expression phenotypes and have them in. Is this because this was better done in other datasets? Or too big and annoying on normalisation? I'd explain the rationale to leave these out.

      This is a very good point raised by the reviewer, and this is in fact something that we initially wanted to do. However, to render the analysis fair and robust, it would require processing all datasets in the same way. This implies cataloging all existing datasets and processing them through the same pipeline. Then, it also requires adding a “cell type” or “tissue” layer, because gene expression data from whole flies is obviously not directly comparable to gene expression data from specific tissues or even specific conditions. This would be key information as phenotypes are often tissue-dependent. So, as implied by the reviewer, we deemed this too big of a challenge beyond the scope of the current paper. Nevertheless, we plan to continue investigating this avenue, especially given the strong transcriptomics background of our lab, in a potential follow-up paper.

      I think 25% FDR is dangerously close to "random chance of being wrong". I'd just redo this section at a higher FDR, even if it makes the results less 'exciting'. This is not the point of the paper anyway.

      We agree with the reviewer that this threshold implies a higher risk of false positive results. However, this is not an uncommonly used threshold (Li et al., PLoS biology, 2008; Bevers et al., Nature Metabolism, 2019; Hwangbo et al, Elife, 2023), and one that seems robust enough in our analysis since similar phenotypes are significant in different studies. Nevertheless, we will revisit these results and explore how a more stringent threshold may impact the results.

      I didn't buy the extreme line piece as being informative. Something has to be on the top and bottom of the ranks; the phenotypes are an opportunity for collection and probably have known (as you show) and cryptic correlations. I think you don't need this section at all for the paper and worry it gives an idea of "super normals" or "true wild types" which ... I just don't think is helpful.

      This section of the paper was intended to investigate anecdotal evidence suggesting that certain DGRP lines consistently rank at the top or bottom when examining fitness-related traits. If accurate, this observation could imply that inbreeding might have made these lines generally weaker, potentially introducing bias into studies aimed at uncovering the genetic basis of complex traits. However, as per the analyses presented, we did not discover support for this phenomenon. Nevertheless, we consider this message important to convey. In response to the reviewer's feedback, we intend to provide a clearer explanation of the reasoning behind this section of the paper and its main conclusion.

      I'd say "well-established inversion genotypes and symbiot levels" rather than generic covariates. Covariates could mean anything. You have specific "covariates" which might actually be the causal thing.

      Thank you. We will update the manuscript accordingly.

      I wouldn't use the adjective tedious about curation. It's a bit of a value judgement and probably places the role of curation in the wrong way. Time-consuming due to lack of standards and best practice?

      Thank you. We will update the manuscript accordingly.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, Gardeux et al provide a web-based tool for curated association mapping results from DRP studies. The tool lets users view association results for phenotypes and compare mean phenotype ~ phenotype correlations between studies. In the manuscript, the authors provide several example utilities associated with this new resource, including pan-study summary statistics for sex, traits, and loci. They highlight cross-trait correlations by comparing studies focused on longevity with phenotypes such as oxphos and activity.

      Strengths:

      -Considerable efforts were dedicated toward curating the many DRG studies provided.

      -Available tools to query large DRP studies are sparse and so new tools present appeal

      Weaknesses:

      The creation of a tool to query these studies for a more detailed understanding of physiologic outcomes seems underdeveloped. These could be improved by enabling usages such as more comprehensive queries of meta-analyses, molecular information to investigate given genes or pathways, and links to other information such as in mouse rat or human associations.

      We appreciate the reviewer's kind comments.

      Regarding the tools, we concur with the reviewer that incorporating additional tools could enhance DGRPool and facilitate users in conducting meta-analyses. Therefore, we intend to introduce a gene-centric tool that enables users to query the database based on gene names. Additionally, we will establish links to ortholog databases within the GWAS results, thereby allowing users to extend fly gene associations to other species, if required.

      Furthermore, we have plans to link out to a 'genome browser-like' view (Flybase’s JBrowse tool) of the GWAS results centered around the affected variants/genes. We are considering integrating this feature into the new gene-centric tool as well.

      Another potential downstream analysis we are considering is gene-set enrichment. This analysis would involve assessing the enrichment of genes in Gene Ontology or other pathway databases directly from the GWAS results page.

    1. Author Response

      We would like to thank reviewers and editors for their thoughtful and constructive review of our manuscript. Below we have provided responses to specific points in the reviewers’ comments and eLIFE assessment, highlighting areas of the manuscript that will be edited for clarity and where efforts will be made to provide data to address reviewer concerns upon a future resubmission.

      eLife assesment:

      The authors report that Dbp5 functions in parallel with Los1 in tRNA export, in a manner dependent on Gle1 and requiring the ATPase cycle of Dbp5, but independent of Mex67, Dbp5's partner in mRNA export. The evidence for this conclusion is still incomplete, as is the biochemical evidence that Dbp5 interacts directly with tRNA in vitro with Gle1 and co-factor InsP6 triggering Dbp5 ATPase activity in the Dbp5-tRNA complex. The evidence that Dbp5 interacts with tRNA in cells independently of Los1, Msn5 and Mex67 is, however, solid.

      We intend to edit the text to make clear our conclusions and accommodate clarifications on a few details of this assessment.

      (1) We would clarify that our data supports a model in which Dbp5 recruitment to tRNA is independent of Mex67 as an adapter in cells; however, this does not mean that Mex67 and Dbp5 do not still co-function in tRNA export. For example, it is possible Dbp5 and Mex67 could still co-function in the same pathway, but instead of Dbp5 working down stream of Mex67, Dbp5 may in fact work upstream as an adapter for Mex67. Edits to the text will be made to ensure this distinction is clear and highlight the possibility for future investigation to elucidate this relationship.

      (2) We would like to highlight that based on structural and biochemical data detailing synergistic activation of Dbp5 ATPase cycle by Gle1/InsP6 and single stranded RNA, it is difficult to imagine a scenario where the apparent synergistic activation of Dbp5 ATPase cycle by tRNA and Gle1/InsP6 (Figure 5) is achieved independent of direct RNA binding. For this reason, we still support the claim that the observed synergistic activation, in combination with other in-vivo and in-vitro data provided in the manuscript, support a model where Dbp5 directly binds tRNA. However, we intend to edit the text to highlight this nuance and potential alternative conclusions based on reviewer feedback.

      Reviewer #1 (Public Review):

      “At least one result suggests that the idea of these pathways in parallel may be too simplistic as deletion of the LOS1 gene, which is not essential decreases the interaction of tRNA export substrate with Dbp5 (Figure 2A). If the two pathways were working in parallel, one might have expected removing one pathway to lead to an increase in the use of the other pathway and hence the interaction with a receptor in that pathway…. The obvious missing experiment here with respect to genetics is the test of whether deletion of the MSN5 gene in the cells, which combines deletion of LOS1 and the dbp5_R423A allele, shown in Figure 1D would be lethal…. The authors provide evidence of a model where the helicase Dbp5 plays a role in tRNA export from the nucleus. Further evidence is required to determine whether Dbp5 could function in the same pathway as the previously defined tRNA export receptors, Los1 and Msn5. There are genetic tests that could be performed to explore this question. Some of the biochemistry presented would show when Los1 is absent that the interaction of Dbp5 with tRNA decreases, which could support a model where Dbp5 plays a role in coordination with Los1”

      We agree that this is an important point that should be made clear and discussed in the text. We also agree that further experiments would be needed to be to confirm Dbp5 functions broadly in tRNA export in parallel to both Msn5 and Los1. We will aim to address these points in resubmission and discuss possible alternative conclusions of the presented results.

      Reviewer #1 (Public Review):

      “While some of the binding assays show rather modest band shifts (Figure 4B for example), the data in Figure 4A showing that there is no binding detected unless a non-hydrolyzable ATP analogue is employed, argues for specificity in nucleic acid binding. The question that does arise is whether the binding is specific for tRNA.”

      The specificity of the in-vitro interactions of Dbp5 are an important point of discussion. We will work to expand the topic of specificity of the in-vitro experiments during resubmission.

      Reviewer #1 (Public Review):

      “With the exception of the binding studies, which also employ a mixture of yeast tRNAs, this study relies primarily on a single tRNA species to come to the conclusions drawn. Many other studies have used multiple tRNAs to explore whether pathways characterized are generalizable to other tRNAs.“

      It was previously shown that Dbp5 functions to support the export of multiple tRNA species (https://doi.org/10.7554/eLife.48410). As such, we agree that additional tRNAs should be tested to explore whether phenotypes reported here are also generalizable to other tRNAs. We will add data targeting additional tRNAs during resubmission.

      Reviewer #2 (Public Review):

      “there are some pieces of data that are misinterpreted. (Figure 1A and B look the same; in Fig 1E, the DAPI staining is abnormal; in Fig 4 the bands can't be seen.)”

      Figure 1A and B represent separate experiments, showing that deletion of Los1 does not alter Dbp5 localization and conversely loss of Dbp5 does not alter Los1 localization. As such localization patterns under loss-of-function conditions look the same as wild-type localization for each protein respectively as noted. We believe that we have come to the same conclusion as the reviewer on Figure 1A and B (and this data is not misinterpreted), but also understand this panel will need to be adjusted for clarity and readability. We will make efforts to edit this figure and accompanying text make the data and conclusions clearer, including addressing the EMSAs in figure 4 and associated text for clarity.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the positive feedback of the reviewers and have modified the manuscript to address their comments, including changes to the text, figures, and methods. We believe that these revisions have strengthened and improved the manuscript. Reviewers’ comments in blue and detailed responses in black are below.

      Reviewer #1 Weaknesses:

      • Is "function" of the ISNs to balance "nutrient need" or osmolarity? Balancing hemolymph osmolarity for physiological homeostasis is conceptually different from balancing thirst and hunger.

      We have added the following text to the introduction to address this: “Thus, the ISNs sense both AKH and hemolymph osmolality, arguing that they balance internal osmolality fluctuations and nutrient need (Jourjine, Mullaney et al., 2016).” (ln 80-82).

      • The final schematic nicely sums up how the different peptidergic pathways might work together, but it is unclear which connections are empirically-validated or speculative. It would be informative to show which parts of the model are speculative versus validated. For example, does FAFB volume synapse = functional connectivity and not just anatomical proximity? A bulk of the current manuscript relies on "synapses of relatively high confidence" (according to Materials and methods: line 522). I recommend distinguishing empirically tested & predicted connections in the final schematic, and maybe reword/clarify throughout the manuscript as "predicted synaptic partners"

      We modified the schematic to clarify EM based connections versus functionally validated connections. We also clarified the EM predicted synaptic partners, using “predicted synaptic partners” throughout the manuscript.

      Reviewer #2 Areas for further development:

      • Does BIT inhibit all of the IPCs or some of them? I think it is critical to indicate the ROIs used for each neuron in the methods. Which part of the neuron is used for imaging experiments? Dendrites, cell bodies, or synaptic terminals?

      ROIs used for quantification are described in the figure legends: “ArcLight response of BiT soma…” (Fig 2, Fig S2), “Calcium responses of CCHa2R-RA neurites in SEZ…” (Fig 4), “Calcium response of CCHa2R-RA SEZ neurites…” (Fig S4), “Calcium response of CCAP neurites…” (Fig 5, Fig S5), “Calcium response of all IPC somas…” (Fig S3). We have added ROIs used for quantification to the ‘In vivo calcium imaging’ and the ‘In vivo voltage imaging’ methods sections (ln 493-494).

      • The discussion section is not giving big picture explanation of how these neurons work together to regulate sugar and water ingestion. Silencing and activation experiments are good, but without showing the innate activity of these neural groups during ingestion, it is not clear what their functions are in terms of regulating fly behavior.

      We agree that how these peptidergic neurons coordinately regulate feeding is unclear. As peptide signals may act at a distance and may cause long-lasting neural activity state changes, studying their integration over space and time is challenging. Acute imaging during feeding would only in part address this challenge, as cumulative changes in nutrient need signals may impart circuit changes that are not apparent by monitoring the acute activity of peptidergic neurons. We modified a paragraph in the discussion to address this (ln 434-443).

      “Overall, our work sheds light on neural circuit mechanisms that translate internal nutrient abundance cues into the coordinated regulation of sugar and water ingestion. We show that the hunger and thirst signals detected by the ISNs influence a network of peptidergic neurons that act in concert to prioritize ingestion of specific nutrients based on internal needs. We hypothesize that multiple internal state signals are integrated in higher brain regions such that combinations of peptides and their actions signify specific needs to drive ingestion of appropriate nutrients. As peptide signals may act at a distance and may cause long-lasting neural activity state changes, studying their integration over space and time is a future challenge to further illuminate homeostatic feeding regulation.”

      Reviewer #1 (Recommendations For The Authors):

      • For the final schematic figure, it may be informative to include nanchung and AKHR in the schematic.

      We now include this (Fig 6).

      • For the ingestion duration with optogenetic activation, I don't think the right way to represent the data is by normalizing them to the no LED control. I think it should show raw ingestion time. I understand that the normalized data make the figure "cleaner" (no need to show +/- LED separately) but I think visualization of the raw data is important.

      We now include this in a new Supplemental Figure (Fig S6).

      • Methods for ingestion with optogenetic activation should be detailed in the Methods section.

      We expanded upon this in the ‘Temporal consumption assay (TCA)’ methods section. (ln 461-466).

      Reviewer #2 (Recommendations For The Authors):

      1) I think the authors are not following the recommendations of the Flywire community which recommends that people who contributed to the tracing of neurons are offered authorship in the published papers. I see the authors are thanking other lab members who have done tracing for the neurons described in this study, but I would like them to clarify whether they are following the guidelines provided by Flywire.

      We followed the Flywire guidelines and contacted all Flywire users contributing more that 10% to neuron edits for permission to publish with acknowledgements. (see Flywire guidelines https://docs.google.com/document/d/1bUkOB5JnT3u__JDvAoVDHJ3zr5NXQtV_63yx2w6Tcc/edit).

      2) The method section for voltage imaging is missing.

      We now include a section on voltage imaging (ln 496-498).

      3) ROIs for imaging are not indicated in the methods or in the figures. It is hard to judge what is the origin of neural activity plotted in the figures; are they imaging cell bodies, dendrites, or axons?

      ROIs used for quantification are described in the figure legends: “ArcLight response of BiT soma…” (Fig 2, Fig S2), “Calcium responses of CCHa2R-RA neurites in SEZ…” (Fig 4), “Calcium response of CCHa2R-RA SEZ neurites…” (Fig S4), “Calcium response of CCAP neurites…” (Fig 5, Fig S5), “Calcium response of all IPC somas…” (Fig S3). We have added ROIs used for quantification to the ‘In vivo calcium imaging’ and the ‘In vivo voltage imaging’ methods sections (ln 493-494).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Thank you very much for your advices and comments. We took your suggestion into consideration and decided to modify it as you suggested. We will add more data and analysis on this topic in the article to make the exposition fuller.

      1) There are different cells in liver tissue, in which BATF protein is expressed most.

      Based on the analysis of single-cell public data (GEO accession: GSE129516), BATF is expressed in every cell cluster in the liver, with the highest expression in T cells and the least in cholangiocytes (Author response image 1).

      Author response image 1.

      2) The statistical data should be provided to support the liver specific over-expression of BATF.

      The results of WB in figure2 (C & E) have been quantified and relevant content has been corrected.

      3) For in vivo study, food intake is key data to exclude the change of energy intake.

      Feed intake related result plots have been added to figureS2A.

      4) For Fig.6 Since PD1 are also highly expressed in heart and spleen, how to exclude the effect of PD1 antibody on these tissues?

      According to the images of the heart (Author response image 2 left) and spleen (Author response image 2 right) during mouse dissection, the morphology and size of the two organs were similar in HFD-CN and HFD-PD1 group. Moreover, relevant literature indicated that PD-1 blockade had little impact on the number and function of transferred T cells within the spleen(Peng et al.),and anti-PD-1 had no effect on mouse splenic cell proliferation (Shindo et al.).Du et al. showed in their study that single use of PD-1 antibody (10 mg/kg, once every three days, for 4 weeks) did not affect mouse heart (Du et al.). Both our results and related literature indicated that PD 1 antibody should not have adverse effects on the heart and spleen.

      Author response image 2.

      Reviewer #2 (Public Review):

      Thank you very much for your advices and comments. We have seriously considered your suggestion and will focus on it in our future research.

      Weakness

      1) BATF protein is also abundantly expressed in control hepatocyte, but the knockdown of BATF had no effect on lipid accumulation. Besides, the expression of BATF was elevated by high fat diets. So it will be interesting to investigate its role in the liver by using its hepatic conditional knockout mice.

      We appreciate the reviewers' suggestion to investigate other functions of BATF in the liver besides its protective role in a high-fat environment. However, we did not use BATF knockout mice in this study because our data indicated that BATF knockdown had no effect on lipid accumulation. We will pursue further research and validation in future studies.

      2) The data for the direct regulation of BATF on PD1 and IL-27 is not enough, it is better to carry out CHIP experiment to further confirm it.

      Thank you for your valuable comments. The article by Kevin Man et al. found that, upregulation of transcription factor BATF regulates PD1 expression and repairs impaired cellular metabolism (Man et al.). This confirms that BATF has a regulatory effect on PD1. And in our manuscript, the dual luciferase reporter assay of BATF and PD1 confirmed that BATF can regulate the expression of PD1(Fig 5G). This confirms that BATF has a regulatory effect on PD1. We do not have conclusive evidence for a direct interaction between BATF and IL-27 yet, but there are some relevant studies that support their connection. For instance, BATF and IRF1 were found to be transcription factors induced early by IL-27 treatment, and essential for Tr1 cell differentiation and function, both in vitro and in vivo (Karwacz et al.). Moreover, Zhang et al. identified BATF as one of the transcription factors regulating IL-27 expression by transcription factor prediction and RNA sequencing analysis (Zhang et al.). These results lay the foundation for elucidating the regulation of PD1 and IL-27 by BATF.

      Reviewer #2 (Recommendations For The Authors):

      1. In Figure 3D, which subunit of AMPK was tested, alpha, beta or gamma?

      Thank you for your valuable comments. We detected the expression level of AMPKα1, We have modified the relevant names in the figure and manuscript.

      Reference:

      Du, Shisuo, et al. "Pd-1 Modulates Radiation-Induced Cardiac Toxicity through Cytotoxic T Lymphocytes." 13.4 (2018): 510-20. Print.

      Karwacz, Katarzyna, et al. "Critical Role of Irf1 and Batf in Forming Chromatin Landscape During Type 1 Regulatory Cell Differentiation." 18.4 (2017): 412-21. Print.

      Man, Kevin, et al. "Transcription Factor Irf4 Promotes Cd8+ T Cell Exhaustion and Limits the Development of Memory-Like T Cells During Chronic Infection." 47.6 (2017): 1129-41. e5. Print.

      Peng, Weiyi, et al. "Pd-1 Blockade Enhances T-Cell Migration to Tumors by Elevating Ifn-Γ Inducible Chemokinespd-1 Blockade Improves the Effectiveness of Act for Cancer." 72.20 (2012): 5209-18. Print.

      Shindo, Yuichiro, et al. "Interleukin 7 and Anti-Programmed Cell Death 1 Antibody Have Differing Effects to Reverse Sepsis-Induced Immunosuppression." 43.4 (2015): 334. Print.

      Zhang, Huiyuan, et al. "An Il-27-Driven Transcriptional Network Identifies Regulators of Il-10 Expression across T Helper Cell Subsets." 33.8 (2020): 108433. Print.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would first like to thank the reviewers and the editor for their insightful comments and suggestions. We are particularly glad to read that our so<ware package constitutes a set of “well-written analysis routines” which have “the potential to become very valuable and foundational tools for the analysis of neurophysiological data”. We have updated the manuscript to address their remarks where appropriate.

      Additionally, we would like to stress that this kind of tools is in continual development. As such, the manuscript offered a snapshot of the package at one point during this process, which in this case was several months ago at initial submission. Since then, several improvements were implemented. The manuscript has been further updated to reflect these more recent changes.

      From the Reviewing Editor:

      The reviewers identified a number of fundamental weaknesses in the paper.

      1) For a paper demonstrating a toolbox, it seems that some example analyses showing the value of the approach (and potentially the advantage in simplification, etc over previous or other approaches) are really important to demonstrate.

      As noted by the first reviewer, the online repository (i.e. GitHub page) conveys a better sense of the toolboxes’ contribution to the field than the present manuscript. This is a fair remark but at the same time, it is unclear how to illustrate this in a journal article without dedicating a great deal of page space to presenting raw code, while online tools offer an easier and clearer way to do this. As a work-around, our strategy was to illustrate some examples of data analysis in Figures 4&5 by comparing each illustrated processing step to the corresponding command line used by the Pynapple package. Each step requires a single line of code, meaning that one only needs to write three lines of code to decode a feature from population activity using a Bayesian decoder (Fig. 4a), compute a cross-correlograms of two neurons during specific stimulus presentation (Fig. 4b) or compute the average firing rate of two neurons around a specific time of the experimental task (Fig. 4c). We believe that these visual aides make it unnecessary to add code in the main text of this manuscript. However, to aid reader understanding, we now provide clear references to online Jupyter notebooks which show how each figure was generated in figure legends as well as in the “Code Availability” section.

      https://github.com/pynapple-org/pynapple-paper-2023

      Furthermore, we have opted-in for the “Executable Research Articles” feature at eLife, which will make it possible to include live scripts and figures in the manuscript once it is accepted for publication. We do not know at this stage what it entails exactly, but we hope that Figures 4&5 will become live with this feature. The readers will have the possibility to see and edit the code directly within the online version of the manuscript.

      2) The manuscript's claims about not having dependencies seem confusing.

      We agree that this claim was somewhat unfounded. There are virtually no Python packages that do not have dependencies. Our intention was to say that the package had no dependencies outside the most common ones, which are Numpy, Scipy, and Pandas. Too many packages in the field tend to have long list of dependencies making long-term back-compatibility quite challenging. By keeping depencies minimal, we hope to maximise the package’'s long term back-compatibility. We have rephrased this statement in the manuscript in the following sections:

      Figure 1, legend.

      “These methods depend only on a few, commonly used, external packages.”

      Section Foundational data processing: “they are for the most part built-in and only depend on a few widely-used external packages. This ensures that the package can be used in a near stand-alone fashion, without relying on packages that are at risk of not being maintained or of not being compatible in the near future.”

      3) Given its significant relevance, it seems important to cite the FMATool and describe connections between it (or analyses based on it) and the presented work.

      Indeed, although we had already cited other toolboxes (including a review covering the topic comprehensively), we should have included this one in the original manuscript. Unfortunately, to the best of our knowledge, this toolbox is not citable (there is no companion paper). We have added a reference to it in plain text.

      4) Some discussion of integration between Pynapple and the rest of a full experimental data pipeline should be discussed with regard to reproducibility.

      This is an interesting point, and the third paragraph of the discussion somewhat broached this issue. Pynapple was not originally designed to pre-process data. However, it can, in theory, load any type of data streams a<er the necessary pre-processing steps. Overall, modularity is a key aspect of the Pynapple framework, and this is also the case for the integration with data pre-processing pipelines, for example spike sorting in electrophysiology and detection of region of interest in calcium imaging. We do not think there should be an integrated solution to the problem but, instead, to make it possible that any piece of code can be used for data irrespective of their origin. This is why we focused on making data loading straightforward and easy to adapt to any particular situation. To expand on this point and make it clear that Pynapple is not meant to pre-process data but can, in theory, load any type of data streams a<er the necessary pre-processing steps, we have added the following sentences to the aforementioned paragraph:

      “Data in neuroscience vary widely in their structure, size, and need for pre-processing. Pynapple is built around the idea that raw data have already been pre-processed (for example, spike sorting and detection of ROIs).”

      5) Relatedly, a description of how data are stored a<er processing (i.e., how precisely are processed data stored in NWB format).

      We agree that this is a critical issue. NWB is not necessarily the best option as it is not possible to overwrite in a NWB file. This would require the creation of a new NWB file each time, which is computationally expensive and time consuming. It also further increases the odds of writing error. Theoretically, users who needs to store intermediate results in a flexible way could use any methods they prefer, writing their own data files and wrappers to reload these data into Pynapple objects. Indeed, it is not easy to properly store data in an object-specific manner. This is a long-standing issue and one we are currently working to resolve.

      To do so, we are developing I/O methods for each Pynapple core objects. We aim to provide an output format that is simple to read and backward compatible in future Pynapple releases. This feature will be available in the coming weeks. To note, while NWB may not be the central data format of Pynapple in future releases, it has become a central node in the neuroscience ecosystem of so<ware. Therefore, we aim to facilitate the interaction of users with reading and writing for this format by developing a set of simple standalone functions.

      Reviewer #1 (Public Review):

      A typical path from preprocessed data to findings in systems neuroscience o<en includes a set of analyses that o<en share common components. For example, an investigator might want to generate plots that relate one time series (e.g., a set of spike times) to another (measurements of a behavioral parameter such as pupil diameter or running speed). In most cases, each individual scientist writes their own code to carry out these analyses, and thus the same basic analysis is coded repeatedly. This is problematic for several reasons, including the waste of time, the potential for errors, and the greater difficulty inherent in sharing highly customized code.

      This paper presents Pynapple, a python package that aims to address those problems.

      Strengths:

      The authors have identified a key need in the community - well-written analysis routines that carry out a core set of functions and can import data from multiple formats. In addition, they recognized that there are some common elements of many analyses, particularly those involving timeseries, and their object- oriented architecture takes advantage of those commonalities to simplify the overall analysis process.

      The package is separated into a core set of applications and another with more advanced applications, with the goal of both providing a streamlined base for analyses and allowing for implementations/inclusion of more experimental approaches.

      Weaknesses:

      There are two main weaknesses of the paper in its present form.

      First, the claims relating to the value of the library in everyday use are not demonstrated clearly. There are no comparisons of, for example, the number of lines of code required to carry out a specific analysis with and without Pynapple or Pynacollada. Similarly, the paper does not give the reader a good sense of how analyses are carried out and how the object-oriented architecture provides a simplified user interaction experience. This contrasts with their GitHub page and associated notebooks which do a better job of showing the package in action.

      As noted in the response to the Reviewing Editor and response to the reviewer’s recommendation to the authors below, we have now included links to Jupyter notebooks that highlight how panels of Figures 4 and 5 were generated (https://github.com/pynapple-org/pynapple-paper-2023). However, we believe that including more code in the manuscript than what is currently shown (I.e. abbreviated call to methods on top of panels in Figs 4&5) would decrease the readability of the manuscript.

      Second, the paper makes several claims about the values of object-oriented programming and the overall design strategy that are not entirely accurate. For example, object-oriented programming does not inherently reduce coding errors, although it can be part of good so<ware engineering. Similarly, there is a claim that the design strategy "ensures stability" when it would be much more accurate to say that these strategies make it easier to maintain the stability of the code. And the authors state that the package has no dependencies, which is not true in the codebase. These and other claims are made without a clear definition of the properties that good scientific analysis so<ware should have (e.g., stability, extensibility, testing infrastructure, etc.).

      Following thFMAe reviewer’s comment, we have rephrased and clarified these claims. We provide detailed response to these remarks in the recommendations to authors below.

      There is also a minor issue - these packages address an important need for high-level analysis tools but do not provide associated tools for preprocessing (e.g., spike sorting) or for creating reproducible pipelines for these analyses. This is entirely reasonable, in that no one package can be expected to do everything, but a bit deeper account of the process that takes raw data and produces scientific results would be helpful. In addition, some discussion of how this package could be combined with other tools (e.g., DataJoint, Code Ocean) would help provide context for where Pynapple and Pynacollada could fit into a robust and reliable data analysis ecosystem.

      We agree the better explaining how Pynapple is integrated within data preprocessing pipelines is essential. We have clarified this aspect in the manuscript and provide more details below.

      Reviewer #1 (Recommendations For The Authors):

      Page 1

      • Title

      The authors should note that the application name- "Pynapple" could be confused with something from Apple. Users may search for "Pyapple" as many python applications contain "py" like "Numpy". "Pyapple" indeed is a Python Apple that works with Apple products. They could consider "NeuroFrame", "NeuroSeries" or "NeuroPandas" to help users realize this is not an apple product.

      We thank the referee for this interesting comment. However, we are not willing to make such change at this point. The community of users has been growing in the last year and it seems too late to change the name. To note, it is the first time such comment is made to us and it does not seem that users and collaborators are confused with any Apple products.

      • Abstract

      The authors mentioned that the Pynapple is "fully open source". It may be better to simply say it is "open source".

      We agree, corrected.

      Assuming the authors keep the name, it would be helpful if the full meaning of Pynapple - Python Neural Analysis Package was presented as early as possible.

      Corrected in the abstract.

      • Highlight

      An application being lightweight and standalone does not imply nor ensure backward compatibility. In general, it would be useful if the authors identified a set of desirable code characteristics, defined them clearly in the introduction, and then describe their so<ware in terms of those characteristics.

      Thank you for your comment. We agree that being lightweight and standalone does not necessarily imply backward compatibility. Our intention was to emphasize that Pynapple is designed to be as simple and flexible as possible, with a focus on providing a consistent interface for users across different versions. However, we understand that this may not be enough to ensure long-term stability, which is why we are committed to regular updates and maintenance to ensure that the code remains functional as the underlying code base (Python versions, etc.) changes.

      Regarding your suggestion to identify a set of desirable code characteristics, we believe this is an excellent idea. In the introduction, we briefly touch upon some of the core principles that guided our development of Pynapple: a lightweight, stable, and simple package. However, we acknowledge that providing a more detailed discussion of these characteristics and how they relate to the design of our so<ware would be useful for readers. We have added this paragraph in the discussion:

      “Pynapple was developed to be lightweight, stable, and simple. As simplicity does not necessarily imply backward compatibility (i.e. long-term stability of the code), Pynapple main objects and their properties will remain the same for the foreseeable future, even if the code in the backend may eventually change (e.g. not relying on Pandas in future version). The small number of external dependencies also decrease the need to adapt the code to new versions of external packages. This approach favors long-term backward compatibility.”

      Page 2

      • The authors wrote -

      "Despite this rapid progress, data analysis o<en relies on custom-made, lab-specific code, which is susceptible to error and can be difficult to compare across research groups."

      It would be helpful to add that custom-made, lab-specific code can lead to a violation of FAIR principles (https://en.wikipedia.org/wiki/FAIR_datadata). More generally, any package can have errors, so it would be helpful to explain any testing regiments or other approach the authors have taken to ensure that their code is error-free.

      We understand the importance of the FAIR principles for data sharing. However, Pynapple was not designed to handle data through their pre-processing. The only aspect that is somehow covered by the FAIR principles is the interoperability, but again, it is a requirement for the data to interoperate with different storage and analysis pipelines, not of the analysis framework itself. Unlike custom-made code, Pynapple will make interoperability easier, as, in theory, once the required data loaders are available, any analysis could be run on any dataset. We have added the following sentence to the discussion:

      “Data in neuroscience vary widely in their structure, size, and need for pre-processing. Pynapple is built around the idea that raw data has already been pre-processed (for example, spike sorting and ROI detection). According to the FAIR principles, pre-processed data should interoperate across different analysis pipelines. Pynapple makes this interoperability possible as, once the data are loaded in the Pynapple framework, the same code can be used to analyze different datasets”

      • The authors wrote -

      "While several toolboxes are available to perform neuronal data analysis ti–11,2ti (see ref. 29 for review), most of these programs focus on producing high-level analysis from specified types of data and do not offer the versatility required for rapidly-changing analytical methods and experimental methods."

      Here it would be helpful if the authors could give a more specific example or explain why this is problematic enough to be a concern. Users may not see a problem with high-level analysis or using specific data types.

      Again, we apologize for not fully elaborating upon our goals here. Our intention was to point out that toolboxes o<en focus on one particular case of high-level analysis. In many cases, such packages lack low level analysis features or the flexibility to derive new analysis pipelines quickly and effortlessly. Users can decide to use low-level packages such as Pandas, but in that case, the learning curve can be steep for users with low, if any, computational background. The simplicity of Pynapple, and the set of examples and notebooks, make it possible for individuals who start coding to be quickly able to analyze their data.

      As we do not want to be too specific at this point of the manuscript (second paragraph of the intro) and as we have clarified many of the aspects of the toolbox in the new revised version, we have only added the following sentence to the paragraph:

      “Users can decide to use low-level data manipulation packages such as Pandas, but in that case, the learning curve can be steep for users with low, if any, computational background.”

      • The authors wrote -

      "To meet these needs, a general toolbox for data analysis must be designed with a few principles in mind"

      Toolboxes based on many different principles can solve problems. It is likely more accurate to say that the authors designed their toolbox with a particular set of principles in mind. A clear description of those principles (as mentioned in the comment above) would help the reader understand why the specific choices made are beneficial.

      We agree that these are not “universal” principles and clearly more the principles we had in mind when we designed the package. We have clarified these principles and made clear that these are personal point of views.

      We have rephrased the following paragraph:

      “To meet these needs, we designed Pynapple, a general toolbox for data analysis in systems Neuroscience with a few principles in mind.“

      • The authors wrote -

      "The first property of such a toolbox is that it should be object-oriented, organizing so<ware around data."

      What facts make this true? For example, React is a web development library. A common approach to using this library is to use Hooks (essentially a collection of functions). This is becoming more popular than the previous approach of using Components (a collection of classes). This is an example of how Object-oriented programming is not always the best solution. In some cases, for example, object- oriented coding can cause problems (e.g. it can be hard to find the place where a given function is defined and to figure out which version is being used given complex inheritance structures.)

      In general, key selling points of object-oriented programming are extension, inheritance, and encapsulation. If the authors want to retain this text (which would be entirely reasonable), it would be helpful if they explained clearly how an object-oriented approach enables these functions and why they are critical for this application in particular.

      The referee makes a particularly important point. We are aware of the limits of OOP, especially when these objects become over-complex, and that the inheritance become unclear.

      We have clarified our goal here. We believe that in our case, OOP is powerful and, overall, is less error- prone that a collection of functions. The reasons are the following:

      An object-oriented approach facilitates better interactions between objects. By encapsulating data and behavior within objects, object-oriented programming promotes clear and well-defined interfaces between objects. This results in more structured and manageable code, as objects communicate with each other through these well-defined interfaces. Such improved interactions lead to increased code reliability.

      Inheritance, a key concept in object-oriented programming, allows for the inheritance of properties. One important example of how inheritance is crucial in the Pynapple framework is the time support of Pynapple objects. It determines the valid epoch on which the object is defined. This property needs to be carried over during different manipulations of the object. Without OOP, this property could easily be forgotten, resulting in erroneous conclusions for many types of analysis. The simplest case is the average rate of a TS object: the rate must be computed on the time support ( a property of TS objects), not the beginning to the end of the recording (or of a specific epoch, independent of the TS). Finally, it is easier to access and manipulate the meta information of a Pynapple object than without using objects.

      • The authors wrote -

      "drastically diminishing the odds of a coding error"

      This seems a bit strong here. Perhaps "reducing the odds" would be more accurate.

      We agree. Now changed.

      Page 3

      • The authors wrote -

      ". Another property of an efficient toolbox is that as much data as possible should be captured by only a small number of objects This ensures that the same code can be used for various datasets and eliminates the need of adapting the structure"

      It may be better to write something like - "Objects have a collection of preset variables/values that are well suited for general use and are very flexible." Capturing "as much data as possible" may be confusing, because it's not the amount that this helps with but rather the variety.

      We thank the referee for this remark. We have rephrased this sentence as follows:

      “Another property of an efficient toolbox is that a small number of objects could virtually represents all possible data streams in neuroscience, instead of objects made for specific physiological processes (e.g. spike trains).”

      • The authors wrote -

      "The properties listed above ensure the long-term stability of a toolbox, a crucial aspect for maintaining the code repository. Toolboxes built around these principles will be maximally flexible and will have the most general application"

      There are two issues with this statement. First, ensuring long-term stability is only possible with a long- term commitment of time and resources to ensure that that code remains functional as the underlying code base (python versions, etc.) changes. If that is something you are commisng to, it would be great to make that clear. If not, these statements need to be less firm.

      Second, it is not clear how these properties were arrived at in the first place. There are things like the FAIR Principles which could provide an organizing framework, ideally when combined with good so<ware engineering practices, and if some more systematic discussion of these properties and their justification could be added, it would help the field think about this issue more clearly.

      The referee makes a valid point that ensuring long-term stability requires a long-term commitment of time and resources to maintain the code as the underlying technology evolves. While we cannot make guarantees about the future of Pynapple, we believe that one of the best ways to ensure long-term stability is by fostering a strong community of users and contributors who can provide ongoing support and development. By promoting open-source collaboration and encouraging community involvement, we hope to create a sustainable ecosystem around Pynapple that can adapt to changes in technology and scientific practices over time. Ultimately, the longevity of any scientific tool depends on its adoption and use by the research community, and we hope that Pynapple can provide value to neuroscience researchers and continue to evolve and improve as the field progresses.

      It is noteworthy that the first author, and main developer of the package, has now been hired as a data scientist at the Center for Computational Neuroscience, Flatiron Institute, to explicitly continue the development of the tool and build a community of users and contributors.

      • The authors wrote -

      "each with a limited number of methods..."

      This may give the impression that the functionality is limited, so rephrasing may be helpful.

      Indeed! We have now rephrased this sentence:

      “The core of Pynapple is five versatile timeseries objects, whose methods make it possible to intuitively manipulate and analyze the data.”

      • The authors wrote that object-oriented coding

      "limits the chances of coding error"

      This is not always the case, but if it is the case here, it would be helpful if the authors explain exactly how it helps to use object-oriented approaches for this package.

      We agree with the referee that it is not always the case. As we explained above, we believe it is less error-prone that a collection of functions. Quite o<en, it also makes it easier to debug. We have changed this sentence with the following one:

      “Because objects are designed to be self-contained and interact with each other through well-defined methods, users are less likely to make errors when using them. This is because objects can enforce their own internal consistency, reducing the chances of data inconsistencies or unexpected behavior. Overall, OOP is a powerful tool for managing complexity and reducing errors in scientific programming.”

      • Fig 1

      In object-oriented programming, a class is a blueprint for the classes that inherit it. Instantiating that<br /> class creates an object. An object contains any or all of these - data, methods, and events. The figure could be improved if it maintained these organizational principles as figure properties.

      We agree with the referee’s remark regarding the logic of objects instantiation but how this could be incorporated in Fig. 1 without making it too complex is unclear. Here, objects are instantiated from the first to the second column. We have not provided details about the parent objects, as we believe these details are not important for reader comprehension. In its present form, the objects are inherited from Pandas objects, but it is possible that a future version is based on something else. For the users, this will be transparent as the toolbox is designed in such a way that only the methods that are specific to Pynapple are needed to do most computation, while only expert programmers may be interested in using Pandas functionalities.

      • The authors wrote that Pynapple does -

      "not depend on any external package"

      As mentioned above, this is not true. It depends on Numpy and likely other packages, and this should be explained. It is perfectly reasonable to say that it depends on only a few other packages.

      As said above, we have now clarified this claim.

      Page 5.

      • The authors wrote -

      "represent arrays of Ts and Tsd"

      For a knowledgeable reader's reference, it would be helpful to refer to these either as Numpy arrays (at least at first when they are defined) or as lists if they are native python objects.

      Indeed, using the word “arrays” here could be confusing because of Numpy arrays. We have changed this term with “groups”.

      • The authors wrote -

      "Pynapple is built with objects from the Pandas library ... Pynapple objects inherit the computational stability and flexibility"

      Here a definition of stability would be useful. Is it the case that by stability you mean "does not change o<en"? Or is some other meaning of stability implied?

      Yes, this is exactly what we meant when referring to the stability of Pandas. We have added the following precision:

      “As such, Pynapple objects inherit the long-term consistency of the code and the computational flexibility computational stability and flexibility from this widely used package.”

      Page 6

      • Fig 2

      In Fig 2 A and B, the illustrations are good. It would also be very helpful to use toy code examples to illustrate how Pynapple will be used to carry out on a sample analysis-problem so that potential users can see what would need to be done.

      We appreciate the kind works. Regarding the toy code, this is what we tried to do in Fig. 4. Instead of including the code directly in the paper, which does not seem a modern way of doing this, we now refer to the online notebooks that reproduce all panels of Figure 4.

      • The authors wrote -

      "While these objects and methods are relatively few"

      In object-oriented programming, objects contain methods. If a method is not in an object, it is not technically a method but a function. It would be helpful if the authors made sure their terminology is accurate, perhaps by saying something like "While there are relatively few objects, and while each object has relatively few methods ... "

      We agree with the referee, we have changed the sentence accordingly.

      • The authors wrote -

      "if not implemented correctly, they can be both computationally intensive and highly susceptible to user error"

      Here the authors are using "correctly" to refer to two things - "accuracy" - gesng the right answer, and "efficiency" - gesng to that answer with relatively less computation. It would be clearer if they split out those two concepts in the phrasing.

      Indeed, we used the term to cover both aspects of the problem, leading to the two possible issues cited in the second part of the sentence. We have changed the sentence following the referee’s advice:

      “While there are relatively few objects, and while each object has relatively few methods, they are the foundation of almost any analysis in systems neuroscience. However, if not implemented efficiently, they can be computationally intensive and if not implemented accurately, they are highly susceptible to user error.”

      • In the next sentence the authors wrote -

      "Pynapple addresses this concern."

      This statement would benefit from just additional text explaining how the concern is addressed.

      We thank the referee for the suggestion. We have changed the sentence to this one: “The implementation of core features in Pynapple addresses the concerns of efficiency and accuracy”

      Page 9

      • The authors wrote -

      This is implemented via a set of specialized object subclasses of the BaseLoader class. To avoid code redundancy, these I/O classes inherit the properties of the BaseLoader class. "

      From a programming perspective, the point of a base class is to avoid redundancy, so it might be better to just mention that this avoids the need to redefine I/O operations in each class.

      We have rephrased the sentence as follows:

      “This is implemented via a set of specialized object subclasses of the BaseLoader class, avoiding the need to redefine I/O operations in each subclass"

      • The authors wrote -

      "classes are unique and independent from each other, ensuring stability"

      How do classes being unique and independent ensure stability? Perhaps here again the misunderstanding is due to the lack of a definition of stability.

      We thank the referee for the remark. We first changed “stability” for “long-term backward compatibility”. We further added the following sentence to clarify this claim. “For instance, if the spike sorting tool Phy changes its output in the future, this would not affect the “Neurosuite” IO class as they are independent of each other. This allows each tool to be updated or modified independently, without requiring changes to the other tool or the overall data format.”

      • The authors wrote -

      "Using preexisting code to load data in a specific manner instead of rewriting already existing functions avoids preprocessing errors"

      Here it might be helpful to use the lingo of Object-oriented programming. (e.g. inheritance and polymorphism). Defining these terms for a neuroscience audience would be useful as well.

      We do not think it is necessary to use too much technical term in this manuscript. However, this sentence was indeed confusing. We have now simplified it:

      “[…], users can develop their own custom I/O using available template classes. Pynapple already includes several of such templates and we expect this collection to grow in the future.”

      Page 10

      • The authors wrote -

      "These analyses are powerful because they are able to describe the relationships between time series objects while requiring the fewest number of parameters to be set by the user."

      It is not clear that this makes for a powerful analysis as opposed to an easy-to-use analysis.

      We have changed “powerful” with “easy to use".

      Page 12

      "they are built-in and thus do not have any external dependencies"

      If the authors want to retain this, it would be helpful to explain (perhaps in the introduction) why having fewer external dependencies is useful. And is it true that these functions use only base python classes?

      We have rephrased this sentence as follows:

      “they are for the most part built-in and only depend on a few common external packages, ensuring that they can be used stand-alone without relying on packages that are at risk of not being maintained or of not being compatible in the near future.”

      Other comments:

      • It would be helpful, as mentioned in the public review, to frame this work in the broader context of what is needed to go from data to scientific results so that people understand what this package does and does not provide.

      We have added the following sentence to the discussion to make sure readers understand:

      “The path from data collection to reliable results involves a number of critical steps: exploratory data analysis, development of an analysis pipeline that can involve custom-made developed processing steps, and ideally the use of that pipeline and others to replicate the results. Pynapple provides a platform for these steps.”

      • It would also be helpful to describe the Pynapple so<ware ecosystem as something that readers could contribute to. Note here that GNU may not be a good license. Technically, GNU requires any changes users make to Pynapple for their internal needs to be offered back to the Pynapple team. Some labs may find that burdensome or unacceptable. A workaround would be to have GNU and MIT licenses.

      The main restriction of the GPL license is that if the code is changed by others and released, a similar license should be used, so that it cannot become proprietary. We therefore stick to this choice of license.

      We would be more than happy to receive contributions from the community. To note, several users outside the lab have already contributed. We have added the following sentence in the introduction:

      “As all users are also invited to contribute to the Pynapple ecosystem, this framework also provides a foundation upon which novel analyses can be shared and collectively built by the neuroscience community.”

      • This so<ware shares some similarities with the nelpy package, and some mention of that package would be appropriate.

      While we acknowledge the reviewer's observation that Nelpy is a similar package to Pynapple, there are several important differences between the two.

      First, Nelpy includes predefined objects such as SpikeTrain, BinnedSpikeTrain, and AnalogSignal, whereas Pynapple would use only Ts and Tsd for those. This design choice was made to provide greater flexibility and allow users to define their own data structures as needed.

      Second, Nelpy is primarily focused on electrophysiology data, whereas Pynapple is designed to handle a wider range of data types, including calcium imaging and behavioral data. This reflects our belief that the NWB format should be able to accommodate diverse experimental paradigms and modalities.

      Finally, while Nelpy offers visualization and high-level analysis tools tailored to electrophysiology, Pynapple takes a more general-purpose approach. We believe that users should be free to choose their own visualization and analysis tools based on their specific needs and preferences.

      The package has now been cited.

      Reviewer #2 (Public Review):

      Pynapple and Pynacollada have the potential to become very valuable and foundational tools for the analysis of neurophysiological data. NWB still has a steep learning curve and Pynapple offers a user- friendly toolset that can also serve as a wrapper for NWB.

      The scope of the manuscript is not clear to me, and the authors could help clarify if Pynacollada and other toolsets in the making become a future aspect of this paper (and Pynapple), or are the authors planning on building these as separate publications.

      The author writes that Pynapple can be used without the I/O layer, but the author should clarify how or if Pynapple may work outside NWB.

      Absolutely. Pynapple can be used for generic data analysis, with no requirement of specific inputs nor NWB data. For example, the lab is currently using it for a computational project in which the data are loaded from simple files (and not from full I/O functions as provided in the toolbox) for further analysis and figure generation.

      This was already noted in the manuscript, last paragraph of the section “Importing data from common and custom pipelines”

      “Third, users can still use Pynapple without using the I/O layer of Pynapple.”.

      We have added the following sentence in the discussion

      “To note, Pynapple can be used without the I/O layer and independent of NWB for generic, on-the-fly analysis of data.”

      This brings us to an important fundamental question. What are the advantages of the current approach, where data is imported into the Ts objects, compared to doing the data import into NWB files directly, and then making Pynapple secondary objects loaded from the NWB file? Does NWB natively have the ability to store the 5 object types or are they initialized on every load call?

      NWB and Pynapple are complimentary but not interdependent. NWB is meant to ensure long-term storage of data and as such contains a as much information as possible to describe the experiment. Pynapple does not use NWB to directly store the objects, however it can read from NWB to organize the data in Pynapple objects. Since the original version of this manuscript was submitted, new methods address this. Specifically, in the current beta version, each object now has a “save” method. Obviously, we are developing functions to load these objects as well. This does not depend on NWB but on npz, a Numpy specific file format. However, we believe it is a bit too premature to include these recent developments in the manuscript and prefer not to discuss this for now.

      Many of these functions and objects have a long history in MATLAB - which documents their usefulness, and I believe it would be fisng to put further stress on this aspect - what aspects already existed in MATLAB and what is completely novel. A widely used MATLAB toolset, the FMA toolbox (the Freely moving animal toolbox) has not been cited, which I believe is a mistake.

      We agree that the FMA toolbox should have been cited. This ha now been corrected.

      Pynapple was first developed in Matlab (it was then called TSToolbox). The first advantage is of course that Python is more accessible than Matlab. It has also been adopted by a large community of developers in data analysis and signal processing, which has become without a doubt much larger than the Matlab community, making it possible to find solutions online for virtually any problem one can have. Furthermore, in our experience, trainees are now unwilling to get training in Matlab.

      Yet, Python has drawbacks, which we are fully aware of. Matlab can be very computationally efficient, and old code can usually run without any change, even many years later.

      A limitation in using NWB files is its standardization with limited built-in options for derived data and additional metadata. How are derived data stored in the NWB files?

      NWB has predetermined a certain number of data containers, which are most common in systems neuroscience. It is theoretically possible to store any kind of data and associated metadata in NWB but this is difficult for a non-expert user. In addition, NWB does not allow data replacement, making is necessary to rewrite a whole new NWB file each time derived data are changed and stored. Therefore, we are currently addressing this issue as described above. Derived data and metadata will soon be easy to store and read.

      How is Pynapple handling an existing NWB dataset, where spikes, behavioral traces, and other data types have already been imported?

      This is an interesting point. In theory, Pynapple should be able to open a NWB file automatically, without providing much information. In fact, it is challenging to open a NWB file without knowing what to look for exactly and how the data were preprocessed. This would require adapting a I/O function for a specific NWB file. Unfortunately, we do not believe there is a universal solution to this problem. There are solutions being developed by others, for example NWB Widgets (NWB Widgets). We will keep an eye on this and see whether this could be adapted to create a universal NWB loader for Pynapple.

      Reviewer #2 (Recommendations For The Authors):

      Other tools and solutions are being developed by the NWB community. How will you make sure that these tools can take advantage of Pynapple and vice versa?

      We recognize the importance of collaboration within the NWB community and are committed to making sure that our tools can integrate seamlessly with other tools and solutions developed by the community.

      Regarding Pynapple specifically, we are designing it to be modular and flexible, with clear APIs and documentation, so that other tools can easily interface with it. One important thing is that we want to make sure Pynapple is not too dependent of another package or file format such as NWB. Ideally, Pynapple should be designed so that it is independent of the underlying data storage pipeline.

      Most of the tools that have been developed in the NWB community so far were designed for data visualisation and data conversion, something that Pynapple does not currently address. Multiple packages for behavioral analysis and exploration of electro/optophysiological datasets are compatible with the NWB format but do not provide additional solutions per se. They are complementary to Pynapple.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank you for your thorough review of the manuscript. We have taken all comments into account in the revised version of the manuscript. Please find below our detailed responses to your comments.

      eLife assessment

      This study reports useful information on the limits of the organotypic culture of neonatal mouse testes, which has been regarded as an experimental strategy that can be extended to humans in the clinical setting for the conservation and subsequent re-use of testicular tissue. The evidence that the culture of testicular fragments of 6.5-day-old mouse testes does not allow optimal differentiation of steroidogenic cells is compelling and would be useful to the scientific community in the field for further optimizations.

      Thank you for this assessment. We have carefully considered all comments and made the requested revisions to improve the manuscript.

      Reviewer #1 (Public Review):

      In this manuscript, the authors aimed to compare, from testis tissues at different ages from mice in vivo and after culture, multiple aspects of Leydig cells. These aspects included mRNA levels, proliferation, apoptosis, steroid levels, protein levels, etc. A lot of work was put into this manuscript in terms of experiments, systems, and approaches. However, as written the manuscript is incredibly difficult to follow. The Introduction and Results sections contain rather loosely organized lists of information that were altogether confusing. At the end of reading these sections, it was unclear what advance was provided by this work. The technical aspects of this work may be of interest to labs working on the specific topics of in vitro spermatogenesis for fertility preservation but fail to appeal to a broader readership. This may be best exemplified by the statements at the end of both the Abstract and Discussion which state that more work needs to be done to improve this system.

      As suggested, we have reworked the manuscript to make it clearer, more meaningful and more precise. We believe that this work may be of interest to a broader readership. Indeed, the development of a model of in vitro spermatogenesis could be of interest for labs working on the specific period of puberty initiation, on germ and somatic cell maturation and on steroidogenesis under physiological and pathological conditions, and could also be useful for testing the toxicity of cancer therapies, drugs, chemicals and environmental agents (e.g. endocrine disruptors) on the developing testis.

      There is a crucial unmet need to optimize the culture conditions for in vitro spermatogenesis. It is important to identify the deregulated molecular mechanisms leading to a decreased in vitro spermatogenic yield. Such results will be of great help to improve organotypic culture conditions. In the present study, we not only uncovered for the first time a failure in adult Leydig cell development, but also an alteration in the expression of several steroidogenic and steroid-metabolizing genes, which could explain the accumulation of progesterone and estradiol and the deficiency of androstenedione in cultured tissues. This hyperestrogenic and hypoandrogenic environment could explain, at least in part, the low efficiency of in vitro spermatogenesis. Furthermore, we show that the addition of hCG (LH homolog) is not sufficient to facilitate Leydig cell differentiation, restore steroidogenesis and improve sperm yield. These data provide valuable information for improving culture conditions. More fundamentally, this culture system could be a useful tool for identifying factors that are essential for the differentiation and functionality of adult Leydig cells during puberty initiation.

      Recommendations For The Authors:

      This reviewer appreciates that a lot of work was put into this manuscript in terms of experiments, systems, and approaches. However, the manuscript needs significant revision, and in this reviewer's opinion is not appropriate for a broader readership journal. The results seem rather incremental, and the topic is too specialized in its current format.

      The manuscript was significantly revised taking into account the reviewer’s comments. In addition, as mentioned above, the development of a model of in vitro spermatogenesis could have wider applications and be of interest to a broader audience.

      Comments for improvement, roughly in order of appearance:

      1) Abstract - would recommend condensing to hit the main points of the manuscript.

      The abstract has been condensed as suggested.

      2) Introduction, overall - this is a rather loosely organized list of information that is not synthesized or communicated in a meaningful way. It contains overstatements and lumps together findings from both mice and primates and thus several statements for the actions of these steroid hormones are inaccurate. The authors rely much too heavily upon reviews and need to replace those with a more scholarly approach of carefully reading and citing primary literature.

      The Introduction has been reorganized to make it clearer, more synthetic, more meaningful and more accurate. Only findings from rodents are presented. We carefully read the literature and replaced most of reviews by primary literature.

      3) Results - this section was extremely difficult to read and comprehend, as it's essentially a laundry list of measurements of mRNAs, steroids, cholesterols, and proteins that go up or down or don't change at multiple ages, both in vitro and in vivo. The section would be improved greatly by an organization with rationale and concluding statements to prepare the reader for the factoid-style data that are presented.

      As suggested, the Results section has been improved by an organization with rationale and concluding statements to make it easier to read and comprehend.

      4) 47 - is this approach going to both "preserve and restore"? Sounds more like it will allow for the production of offspring, but the other goals are not going to happen from the approach listed in the latter part of that sentence - so not really "fertility restoration" but more of an insurance program that sperm can be produced for ART

      Freezing of prepubertal testicular tissue, which contains spermatogonia, is a fertility preservation option proposed to prepubertal boys with cancer prior to highly gonadotoxic treatments. Several fertility restoration strategies, which aim to allow the production of spermatozoa from cryopreserved spermatogonia, are being developed, including in vitro spermatogenesis. This sentence has been rewritten.

      5) 62 - specify whether this "decreased expression" is mRNA or protein, and is this because of a loss of Sertoli cells?

      “Decreased expression” was replaced by “decreased mRNA levels”. The results we obtained in the cited study (Rondanino et al., 2017) suggest that the decrease in Rhox5 mRNA levels is not the consequence of a change in the proportion of Sertoli cells but reflects an alteration in Rhox5 gene expression. In Figure 6U of the present study, we show indeed that there is no loss of Sertoli cells in organotypic cultures.

      6) 66 - what is "the first wave of mouse in vitro spermatogenesis"? Are these cultures from the first wave of mouse in vivo spermatogenesis, or is there a second wave of in vitro spermatogenesis? Please specify

      In the mouse, the first entry into meiosis occurs around 8-10 dpp and the first spermatozoa are produced at around 35 dpp: this is the first wave of spermatogenesis which takes place at the onset of puberty. By culturing 6 dpp-old testes for 30 days, our aim is to reproduce in vitro all the stages of this first wave of spermatogenesis, i.e. entry into meiosis, completion of meiosis and spermiogenesis.

      In the cited study (Pence et al., 2019), the authors cultured 5 dpp testes for 35 to 49 days and observed a decline in intratesticular testosterone levels in the cultured tissues, i.e. after the end of the first spermatogenic wave, compared to in vivo controls. Our sentence has been rewritten to make it clearer.

      7) 78 - is there a difference in T production by Fetal vs Adult LCs? It is this reviewer's understanding that the levels of T around birth in mice (and then a few months after birth in humans) are quite high, similar to adults. So, what are the authors suggesting here by providing the list of expressed genes in these two LC populations?

      As mentioned in the Introduction section, 17β-HSD3 – the enzyme responsible for the conversion of androstenedione to T – is not expressed in fetal Leydig cells but is expressed in adult Leydig cells. Therefore, unlike adult Leydig cells, fetal Leydig cells are not capable of synthesizing T.

      In the present study, we investigated steroidogenesis but also wondered which types of Leydig cells could be detected under in vitro conditions. It is therefore important to explain to the reader which steroidogenic proteins are expressed by the different Leydig cell populations.

      As described in O’Shaughnessy et al., 2002, levels of intratesticular T decline after birth, being very low between 10 and 20 dpp. Then, T levels increase. At 25 dpp, T levels are close to those observed at 1 dpp. T levels increase more than 16-fold between 25 and 30 dpp and then double between 30 dpp and adulthood. Therefore, intratesticular T levels around birth in mice are not as high as in adults, but are about 36-fold lower after birth than in adulthood. It has been shown that in the fetal testis, the conversion of androstenedione produced by fetal Leydig cells is achieved by the adjacent fetal Sertoli cells that express 17β-HSD3 (O’Shaughnessy et al., 2000; Shima et al., 2013). During postnatal development however, Sertoli cells lose the expression of 17β-HSD3 (O’Shaughnessy et al., 2000).

      8) 79 -99 - can the authors revise this long list of information to provide a summary of what they are trying to communicate to the reader? What is the intention of this information?

      This paragraph has been modified to make it clearer and more synthetic. As different Leydig cell markers are presented in the Results section, it is important to introduce the reader to the different types of Leydig cells, the proteins expressed by these cells and the factors involved in their proliferation and differentiation.

      9) 101-2 - replace "involved in" with a more meaningful word - and it is this reviewer's understanding that T has not been shown convincingly to have much of a role in spermatogonial development, at least in mice - that statement is likely true in primates, but not mice; provide primary literature citations to be more precise, rather than a broad review that covers multiple species

      “involved in” was replaced by “is essential for many aspects of spermatogenesis, including”. Moreover, we removed “spermatogonial proliferation and differentiation” and provide primary literature citations to be more precise.

      10) 105-7 - similar concern for E as for T, above - KO mouse models for ERalpha and beta did not show defects in spermatogenesis as described - not sure what evidence the authors are specifically referring to here - cite primary literature rather than a review on Vitamin D + estrogen

      We agree that the question of whether estrogens play a direct role in spermatogenesis was unanswered by the ER null mice. However, estrogens have been shown to be important for the long-term maintenance of spermatogenesis in the ArKO mouse (Robertson et al., 1999) and for the progression of normal germ cell development in the ENERKI mouse (Sinkevicius et al., 2009). This sentence has been reworded and primary literature is cited to be more precise.

      11) 113-4 - there is no convincing evidence this reviewer is aware of that the AR is expressed in male germ cells, and therefore T actions on germ cells are indirect, through Sertoli cells and perhaps PTMs; if there is some, this sentence needs a citation showing that

      We agree that there is no evidence that AR is expressed in male germ cells and that T acts indirectly on germ cells. This sentence has been rewritten.

      12) 114-6 - this is untrue - nowhere in that paper was testosterone or androgen even mentioned!

      This reference has been removed. We apologize for this mistake.

      13) 116-7 - again, E actions through the ERs are thought to be indirect in the testis, not acting on germ cells; if this is incorrect, please add supportive citations and explain; replace "involved" with a more meaningful word; Rhox5 has a very minor role in spermatogenesis

      In contrast to androgen receptors, which are localized in somatic cells, estrogen receptors have been found in most testicular cells, including germ cells. The studies reporting the expression of estrogen receptors in germ cells are cited in the Introduction section. The word “involved” was replaced by “promotes”.

      Rhox5 (also known as Pem) has not a very minor role in spermatogenesis. On the contrary, its expression is crucial for normal spermatogenesis and sperm maturation, as loss of Rhox5 in male mice leads to reduced fertility, increased germ cell apoptosis, decreased sperm count and decreased sperm motility (MacLean et al., 2005).

      14) 117 - Ref 29 does not support the statement about Rhox5's role in spermatogenesis

      The reference (MacLean et al., 2005), supporting the statement about Rhox5’s role in spermatogenesis, was added in the manuscript.

      15) 120 - Does FAAH have a protective role in that it is anti-apoptotic? Or just required for some other Sertoli cell function? Should re-word to be more specific.

      FAAH (fatty acid amide hydrolase), whose expression is stimulated by estrogens, has been shown to have a crucial role in promoting survival of Sertoli cells by degrading anandamide (N-arachidonoylethanolamine), an endocannabinoid which has a pro-apoptotic activity (Rossi et al., 2007).

      The sentence has been reworded to be more specific.

      16) 127 - should complete the Introduction with a sentence summarizing what was done and found, for reader clarity

      The Introduction has been completed for reader clarity.

      17) 136 - misspelled the procedure

      Orchidectomy was replaced by orchiectomy.

      18) Mice - why use half-day nomenclature for postpartum mice? This is not standard in the literature.

      Half-day nomenclature was used due to the uncertainty of the time of birth, which mostly takes place during the night. Since this is not standard in the literature, half-day nomenclature was removed in the entire manuscript.

      19) 172-3 - the half-life of RA is very short (<1 hr), and it is light-sensitive. This addition every 8 days means that retinoids are present for a very minimal window of time - are the authors sure retinoids have no requirement elsewhere during spermatogenesis? And in the literature, the measured pulse of RA in the mouse lasts >40 hours (stages VII-IX)...

      RA is mandatory for proper spermatogenesis and is needed many times during spermatogenesis (for review, see Schleif et al., 2022): RA is involved in spermatogonial differentiation, pre-meiotic activation and meiotic completion, establishment of the blood-testis barrier and spermiation. In our study, we did not add RA in the culture medium but retinol, the precursor of RA. Indeed, our previous studies have shown beneficial effects of retinol on in vitro spermatogenesis, including an increased production of spermatids with less nuclear alterations and DNA damage (Arkoun et al., 2015; Dumont et al., 2016).

      The reason we added retinol (and not RA, which has a very short half-life) in this study and in our previous studies is that it can be oxidized into RA but also be stored in Sertoli cells in the form of retinyl esters for later use. As retinol is photosensitive, handling and storage were performed in tubes covered with aluminum foil, which protects from direct light exposure.

      20) 362 - Start the Results section with a broader statement(s) that prepares the readers rather than jumping into specific experiments; it would be helpful for readers to have concluding sentences included as well for readers to navigate the Results section.

      As suggested, the Results section has been improved by an organization with rationale and concluding sentences to facilitate reading.

      21) 364 - KI67 is a marker of.

      Ki67 is widely used as a cell proliferation marker.

      22) 367 - replace "involved".

      “involved” was replaced by “necessary for”.

      23) What intensity thresholds were used to define a cell as positive or negative for a given marker? And there seemed to be no mention of controls - especially no primary antibody controls. This is a significant oversight if these were not done in parallel with every single immunostaining experiment.

      We did not apply intensity thresholds. Cells presenting detectable labeling were defined as positive, while unlabeled cells were defined as negative.

      Negative controls, performed by omitting the primary antibodies, were of course done in parallel to each immunostaining and are presented in Figure 1A, Figure 2J and Figure 5C. The mention of negative controls has been added in the Materials and methods section.

      24) 388 - INSL3 - is this referring to mRNA or protein? Protein nomenclature is used...

      INSL3 is here referring to the protein, whose concentrations were measured by radioimmunoassay.

      25) 402 - typo.

      “expect” was replaced by “except”.

      26) 409 - do mRNA levels really "determine the testicular steroidogenic potential"??

      This sentence has been reworded: “determine the testicular steroidogenic potential” was replaced by “highlight a potential deregulation of their expression”.

      27) 410 - western should not be capitalized.

      Western Blot was replaced by western blot in the entire manuscript.

      28) 405-28 - this reviewer is underwhelmed by qRT-PCR results for a handful of markers - what is the purpose? The results do not prove anything about the function of the system.

      As the differentiation of Leydig cells is not fully completed in organotypic cultures, we wanted to know which actors of the steroidogenic pathway show deregulated expression in vitro in comparison to physiological conditions, and thus which steps of the steroid hormone biosynthesis pathway may be impaired. We found that the expression of several genes encoding steroidogenic enzymes was decreased in vitro, notably that of Cyp17a1, necessary for the conversion of progesterone to androstenedione. Transcript levels of Hsd17b2, encoding an enzyme that converts estradiol to estrone and testosterone to androstenedione, were also decreased at D30.

      Our data therefore show that the expression of several steroidogenic genes and steroid metabolizing genes is deregulated in organotypic cultures but we agree that these results do not prove anything about the function of the system.

      We then found an accumulation of estradiol and progesterone, a decrease in androstenedione and unchanged testosterone levels in cultured tissues. The elevation in progesterone and the reduction in androstenedione in in vitro matured tissues could arise from the reduced expression of Cyp17a1. In addition, reduced Hsd17b2 transcript levels may explain why estradiol levels remain elevated in cultures while testosterone levels are similar to controls and androstenedione levels are low.

      29) How do the authors interpret data gleaned from tissues containing a variably-sized necrotic core?

      In the present study, the central necrotic area was consistent between all samples and variables: it represents on average 16-27% of the explants.

      As in our previous publications and recent RNA-seq analyses (Rondanino et al., 2017; Oblette et al., 2019; Dumont et al., 2023), the central necrotic area was removed so that transcript and protein levels in the healthy part of the samples (i.e. where in vitro spermatogenesis occurs) could be measured and compared with in vivo controls. In order to be able to compare the healthy part of the in vitro matured tissues with in vivo controls, transcript levels were normalized to housekeeping genes (Gapdh and Actb) or to the Leydig cell-specific gene Hsd3b1 while protein levels were normalized to ACTB or to 3β-HSD.

      30) 520 - after reading to this point, this reviewer was left confused and wondering why any of this is important to the reader unless that reader specifically works on this topic. The way the data were presented makes it nearly impossible for the reader to keep any of the data in their mind as they read. It's a seemingly endless list of ups and downs of many things under many conditions. What is the point of all of this? How will it advance our understanding of spermatogenesis? Or improve in vitro culture? Or help prepubertal cancer patients? Presumably, that will be explained in the Discussion, but at this point, this reviewer honestly has no idea what this all means. Why is this important??

      We have modified the Results section by including rationale and concluding statements to make it easier to read and follow for all readers, not necessarily for those working on this topic.

      As mentioned above, the identification of the molecular mechanisms that are deregulated in vitro will give us important insights for the optimization of the culture system. The development of an optimized model of in vitro spermatogenesis could lead to several applications, including improving our knowledge of the regulation of spermatogenesis during pubertal development.

      In this study, our main findings are that the differentiation of the adult Leydig cell lineage, steroid biosynthesis, metabolism and signaling are altered in organotypic cultures, leading to a hyperestrogenic and hypoandrogenic environment. In addition, we show that the presence of an LH homolog, known to be critical to adult Leydig cell differentiation and to stimulate steroidogenesis, does not rescue the expression of adult Leydig cell markers and of several steroidogenic genes, steroid metabolizing genes and steroid target genes. Other factors required for Leydig cell maturation and functionality will have to be tested in the future on cultured testicular tissues. Improvements to this in vitro maturation procedure in animal models may be useful for future cultures of human testicular biopsies, although we are aware that more work needs to be done before prepubertal cancer patients can benefit from this in vitro maturation approach.

      31) 619-20 - this sort of summarizes this reviewer's overall opinion of the manuscript. Not much seems to have been learned here that would justify publication in a broad readership journal like eLife. More work needs to be done to provide that sort of meaningful advance. The current work, with considerable re-writing to improve accuracy and clarity, is much better suited to a specialty journal where others who are working on this specific topic will appreciate its value.

      We have carefully considered the reviewer’s comments and modified the manuscript to improve accuracy and clarity. We understand the reviewer’s point of view, but we believe that this work may be of interest not only to labs working on fertility preservation and restoration, but also to those working on puberty initiation, germ and somatic cell maturation, steroidogenesis under physiological and pathological conditions, and on the effect of cancer therapies, drugs, chemicals and environmental agents (e.g. endocrine disruptors) on the developing testis.

      As mentioned above, we not only uncovered for the first time a failure in adult Leydig cell development, but also an alteration in the expression of several steroidogenic and steroid-metabolizing genes, which could explain the accumulation of progesterone and estradiol and the deficiency of androstenedione in cultured tissues. This hyperestrogenic and hypoandrogenic environment could explain, at least in part, the low efficiency of in vitro spermatogenesis. Furthermore, we show that the addition of hCG (LH homolog) is not sufficient to facilitate Leydig cell differentiation, restore steroidogenesis and improve sperm yield. These data provide valuable information for improving culture conditions. More fundamentally, this culture system could be a useful tool for identifying factors that are essential for the differentiation and functionality of adult Leydig cells during puberty initiation.

      32) Why are the figures repeated at the end of the manuscript?

      During the submission process, our bioRxiv preprint (which contains the figures) was merged with the same but higher quality figures.

      Reviewer #2 (Public Review):

      Preserving and restoring the fertility of prepubertal patients undergoing gonadotoxic treatments involves freezing testicular fragments and waking them up in a culture in the context of medically assisted procreation. This implies that spermatogenesis must be fully reproduced ex vivo. The parameters of this type of culture must be validated using non-human models. In this article, the authors make an extensive study of the quality of the organotypic culture of neonatal mouse testes, paying particular attention to the differentiation and endocrine function of Leydig cells. They show that fetal Leydig cells present at the start of culture fail to complete the differentiation process into adult Leydig cells, which has an impact on the nature of the steroids produced and even on the signaling of these hormones.

      The authors make an extensive study of the different populations of Leydig cells which are supposed to succeed each other during the first month of life of the mouse to end up with a population of adult and fully functional cells. The authors combine quantitative in situ studies with more global analyzes (RT-QtPCR Western blot, hormonal assays), which range from gene to hormone. This study is well written and illustrated, the description of the methods is honest, the analyses systematic, and are accompanied by multiple relevant control conditions.

      Since the aim of the study was to study Leydig cell differentiation in neonatal mouse testis cultures, the study is well conceived, the results answer the initial question and are not over-interpreted.

      My main concern is to understand why the authors have undertaken so much work when they mention RNA extractions and western blot, that the necrotic central part had to be carefully removed. There is no information on how this parameter was considered for immunohistochemistry and steroid measurements. The authors describe the initial material as a quarter testis, but they don't mention the resulting size of the fragment. A brief review of the literature shows that if often the culture medium is crucial for the quality of the culture (and in particular the supplementations as discussed by the authors here), the size of the fragments is also a determining factor, especially for long cultures. The main limitation of the study is therefore that the authors cannot exclude that central necrosis can have harmful effects on the survival and/or the growth and/or the differentiation of the testis in culture. In this sense, the general interpretation that the authors make of their work is correct, the culture conditions are not optimized.

      When using the organotypic culture system at a gas-liquid interphase, the central part of the testicular tissue becomes necrotic. As previously reported (Komeya et al., 2016), the central region receives insufficient nutrients and oxygen. In vitro spermatogenesis therefore only occurs in the seminiferous tubules present in the peripheral region. As in our previous publications and recent RNA-seq analyses (Rondanino et al., 2017; Oblette et al., 2019; Dumont et al., 2023), the central necrotic area was removed so that transcript and protein levels in the healthy part of the samples (i.e. where in vitro spermatogenesis occurs) could be measured and compared with in vivo controls. For histological and immunohistochemical analyses, only seminiferous tubules located at the periphery of the cultured fragments (outside of the necrotic region) were analyzed. Steroid measurements were performed on the entire fragments.

      The initial material was indeed a quarter testis, which represents approximately 0.75 mm3. No growth of the fragments was observed during the organotypic culture period (Figure 8-figure supplement 1). We agree with the reviewer that the composition of the culture medium is not the only parameter to be considered for the quality of the culture and that the size of the fragments is also a determining factor. We previously determined that 0.75 mm3 was the most appropriate size for mouse in vitro spermatogenesis (Dumont et al., 2016). We do not exclude at all that central necrosis can have harmful effects on the survival and/or the growth and/or the differentiation of the testis in culture. Optimization of the culture medium and culture design (so that the tissue center receives sufficient nutrients and oxygen) will be necessary to increase the yield of in vitro spermatogenesis.

      Organotypic culture is currently trying to cross the doors of academic research laboratories to become a clinical tool, but it requires many adjustments and many quality controls. This study shows a perfect example of the pitfall often associated with this approach. The road is still long, but every piece of information is useful.

      Reviewer #3 (Public Review):

      Moutard, Laura, et al. investigated the gene expression and functional aspects of Leydig cells in a cryopreservation/long-term culture system. The authors found that critical genetic markers for Leydig cells were diminished when compared to the in-vivo testis. The testis also showed less androgen production and androgen responsiveness. Although they did not produce normal testosterone concentrations in basal media conditions, the cultured testis still remained highly responsive to gonadotrophin exposure, exhibiting a large increase in androgen production. Even after the hCG-dependent increase in testosterone, genetic markers of Leydig cells remained low, which means there is still a missing factor in the culture media that facilitates proper Leydig cell differentiation. Optimizing this testis culture protocol to help maintain proper Leydig cell differentiation could be useful for future human testis biopsy cultures, which will help preserve fertility and child cancer patients.

      Methods: In line 226, there is mention that the central necrotic area was carefully removed before RNA extraction. This is particularly problematic for the inference of these results, especially for the RT-qPCR data. Was the central necrotic area consistent between all samples and variables (16 and 30FT)? How big was the area? This makes the in-vivo testis not a proper control for all comparisons. Leydig cells are not evenly distributed throughout the testis. A lot of Leydig cells can be found toward the center of the gonad, so the results might be driven by the loss of this region of the testis.

      When using the organotypic culture system at a gas-liquid interphase, the central part of the testicular tissue becomes necrotic. As previously reported (Komeya et al., 2016), the central region receives insufficient nutrients and oxygen. In vitro spermatogenesis therefore only occurs in the seminiferous tubules present in the peripheral region. As in our previous publications and recent RNA-seq analyses (Rondanino et al., 2017; Oblette et al., 2019; Dumont et al., 2023), the central necrotic area was removed so that transcript levels in the healthy part of the samples (i.e. where in vitro spermatogenesis occurs) could be measured and compared with in vivo controls. In order to be able to compare the healthy part of the in vitro matured tissues with in vivo controls, transcript levels of the selected genes were normalized to housekeeping genes (Gapdh and Actb) or to the Leydig cell-specific gene Hsd3b1.

      The central necrotic area was consistent between all samples and variables: it represents on average 16-27% of the explants.

      Moreover, we would like to point out that the gonads were cut into four fragments before in vitro cultures. It is therefore the central part of the cultured explants that was removed and not the central part of the gonads. The central part of the gonads was thus included in our analyses.

      What did the morphology of the testis look like after culturing for 16 and 30 days? These images will help confirm that the culturing method is like the Nature paper Sato et al. 2011 and also give a sense of how big the necrotic region was and how it varied with culturing time.

      Images showing mouse testicular tissues cultured for 16 and 30 days are presented in Figure 8-figure supplement 1. The cultured tissues resemble those shown by Sato et al., 2011. As mentioned above, the central necrotic area represents on average 16-27% of the explants. No significant difference in the area of the necrotic region was found between the two culture time points.

      There are multiple comparisons being made. Bonferroni corrections on p-value should be done.

      Bonferroni corrections are used when multiple comparisons are conducted. As mentioned in the Materials and methods section, multiple comparisons were not made in this study. Indeed, the non-parametric Mann-Whitney test was used to compare two conditions: in vitro vs in vivo (D16 FT vs 22 dpp, D16 CSF vs 22 dpp, D30 FT vs 36 dpp, D30 CSF vs 36 dpp, D30 FT + hCG vs 36 dpp, D30 CSF + hCG vs 36 dpp), cultures of fresh vs frozen tissues (6 dpp vs 6 dpp CSF, D16 FT vs D16 CSF, D30 FT vs D30 CSF, D30 FT + hCG vs D30 CSF + hCG) and cultures with vs without hCG (D30 FT + hCG vs D30 FT, D30 CSF + hCG vs D30 CSF). These comparisons were added in the Materials and methods section.

      Results: In the discussion, it is mentioned that IGF1 may be a missing factor in the media that could help Leydig cell differentiation. Have the authors tried this experiment? Improving this existing culturing method will be highly valuable.

      The decreased Igf1 mRNA levels found in the present study are in line with the RNA-seq data of Yao et al., 2017. As mentioned in the Discussion section, the addition of IGF1 in the culture medium led to a modest increase in the percentages of round and elongated spermatids in cultured mouse testicular fragments (Yao et al., 2017). However, the effect of IGF1 supplementation on Leydig cell differentiation was not investigated. The supplementation of organotypic culture medium with IGF1 is currently being tested in our research team.

      Add p-values and SEM for qPCR data. This was done for hormones, should be the same way for other results.

      p-values and SEM are shown for both qPCR and hormone data.

      Regarding all RT-qPCR data-There is a switch between 3bHSD and Actb/Gapdh as housekeeping genes. There does not seem to be as some have 3bHSD and others do not. Why do Igf1 and Dhh not use 3bHSD for housekeeping? If this is the method to be used, then 3bHSD should be used as housekeeping for the protein data, instead of ACTB. Also, based on Figure 1B and Figure 2A (Hsd3b1) there does not seem to be a strong correlation between Leydig cell # and the gene expression of Hsd3b1. If Hsd3b1 is to be used as a housekeeper and a proxy for Leydig cell number a correlation between these two measurements is necessary. If there is no correlation a housekeeping gene that is stable among all samples should be used. Sorting Leydig cells and then conducting qPCR would be optimal for these experiments.

      Hsd3b1 was used as a housekeeping gene only to normalize the mRNA levels of Leydig cell-specific genes. Therefore, Igf1 and Dhh transcript levels were not normalized with Hsd3b1 since Igf1 is expressed by several cell types in the testis (Leydig cells, Sertoli cells, peritubular myoid cells) and Dhh is expressed by Sertoli cells.

      Regarding western blots, the expression of AR, CYP19 and FAAH could not be normalized with 3-HSD since AR is expressed by Leydig cells, Sertoli cells and peritubular myoid cells, CYP19 is expressed by Leydig cells and germ cells and FAAH is expressed by Sertoli cells. For CYP17A1 however, 3B-HSD was used as housekeeping instead of ACTB (Figure 2G).

      No correlation was found between the number of Leydig cells per cm2 of testicular tissue shown in Figure 1 and Hsd3b1 mRNA levels presented in Figure 2. However, this result was expected since on the one hand the number of Leydig cells per cm2 was determined in the peripheral region of one tissue section whereas on the other hand Hsd3b1 transcript levels were measured in the entire peripheral region of the cultured fragments. The correction factor used for the analysis of genes expressed in Leydig cells present in the healthy part of the cultured tissues was therefore the Leydig cell selective marker Hsd3b1, as previously described (Cacciola et al., 2013).

      Figure 2A (CYP17a1): It is surprising that the CYP17a1 gene and protein expression is very different between D30FT and 36.5dpp, however, the immunostaining looks identical between all groups. Why is this? A lower magnification image of the testis might make it easier to see the differences in Cyp17a1 expression. Leydig cells commonly have autofluorescence and need a background quencher (TrueBlack) to visualize the true signal in Leydig cells. This might reveal the true differences in Cyp17a1.

      RT-qPCR and western blot analyses show that both Cyp17a1 mRNA levels and CYP17A1 protein levels are decreased in organotypic cultures at D30. However, we agree that such a decrease is not visible in immunostaining. No autofluorescence of Leydig cells could be observed in the negative controls (Figure 2J).

      Figure 3D: there are large differences in estradiol concentration in the testis. Could it be that the testis is becoming more female-like? Leydig and Sertoli cells with more granulosa and theca cell features? Were any female markers investigated?

      We show in the present study that the expression levels of the Sertoli cell-specific gene Dhh are not reduced in organotypic cultures. We also previously found that the expression levels of the Sertoli cell-specific gene Amh were not reduced in in vitro matured testicular tissues (Rondanino et al., 2017). Moreover, we have recently shown that Sox9, encoding a testis-specific transcription factor, is expressed in organotypic cultures (Dumont et al., 2023). Our recent transcriptomic analysis also revealed that the transcript levels of the pro-male sexual differentiation marker Sry and of the Sertoli cell-specific gene Dmrt1 remained unchanged in organotypic cultures compared to in vivo controls (Dumont et al., 2023). In addition, no increase in the mRNA levels of the female sex-determining genes Foxl2 and Rspo1 was found in vitro (Dumont et al., 2023). However, we cannot rule out that in vitro cultured testes are becoming more female-like as the expression of Hsd17b3, encoding an androgenic enzyme, is reduced (this study) while the expression of the feminizing gene Wnt4 is upregulated (Dumont et al., 2023).

      Figure 3D and Figure 5A: It is hard to imagine that intratesticular estradiol is maintained for 16-30 days without sufficient CYP19 activity or substrate (testosterone). 6.5 dpp was the last day with abundant CYP19 expression, so is most of the estrogen synthesized on this first day and it sticks around? Are there differences in estradiol metabolizing enzymes? Is there an alternative mechanism for E production?

      In the present study, abundant CYP19 expression was indeed found at 6 dpp. However, the expression of this enzyme was not measured between 6 dpp and D16. Therefore, we cannot be sure that 6 dpp is the last day with abundant CYP19 expression. We assume that the estradiol synthesized before D16 may then accumulate within the cultured tissues. In our study, we quantified the transcript levels of Sult1e1, encoding an estradiol metabolizing enzyme. SULT1E1 is thought to play a physiological role in protecting Leydig cells from estrogen-induced biochemical lesions (Tong et al., 2004). A reduction in Sult1e1 mRNA levels was found at D30 in comparison to in vivo controls, but this may occur earlier during organotypic culture. In addition, decreased transcript levels of Hsd17b2, which encodes an estrogen metabolizing enzyme that converts estradiol to estrone, were found at D30 in this study. We suggest in the Discussion section that elevated estradiol levels in cultured tissues could be a consequence of low Sult1e1 and Hsd17b2 expression. Our recent transcriptomic analyses show that the levels of Cyp1a1, Cyp1b1 and Comt, encoding other estrogen metabolizing enzymes, are unchanged in organotypic cultures (Dumont et al., 2023). To our knowledge, there is no alternative mechanism for estradiol production.

      Recommendations For The Authors:

      1) The acronyms, PLC, SLC, ILC, ALC, and FLC, become hard to follow. It is recommended to spell out the names.

      PLC was replaced by progenitor Leydig cells, SLC by stem Leydig cells, ILC by immature Leydig cells, ALC by adult Leydig cells and FLC by fetal Leydig cells in the entire manuscript.

      2) All Figures: Use letters for each bar graph. Difficult to make a connection from text to figure.

      A letter was added to each bar graph.

      3) Supplemental figure 1: Change "Changement du milieu" to English.

      These words were replaced by “Medium change”.

      4) Catalog numbers for antibodies are necessary.

      The catalog numbers of the antibodies used in this study are presented in Supplementary Table 1.

    1. Author Response

      Reviewer #1 (Public Review)

      The authors present a scRNAseq study describing the transcriptomes of the tendon enthesis during postnatal development. This is an important topic that has major implication for the care of common clinical problems such as rotator cuff repair. The results are a valuable addition to the literature, providing a descriptive data set reinforcing other, more comprehensive studies. There are weaknesses, however, in the scRNAseq analyses.

      1)The authors should provide additional rationale for the PCA analysis shown in Fig 1d. It is uncommon to use PCA for histomorphologic parameters. These results do not convincingly demonstrate that P7 is as a critical developmental timepoint.

      2) According to the methods, it appears that the entire humeral head-supraspinatus tendon was used for cell isolation for scRNAseq. This results in the inclusion of cells from a variety of tissues, including bone, growth plate, enthesis and tendon. As such, only a very small percentage of cells in the analysis came from the enthesis. Inclusion of such a wide range of cells makes interpretation of enthesis cells difficult.

      3) The differentiationpseudotime analysis described in Fig 3 is difficult to follow. This map includes cell transcriptomes from vastly different tissues. Presumably, embedded in these maps are trajectories for osteoblast differentiation, chondrocyte differentiation, tenocyte differentiation, etc. With so many layers of overlapping information, it is difficult to (algorithmically) deduce a differentiation path of a particular cell type.

      4) The authors uses the term function throughout the paper (e.g., functional definition of fibrocartilage subpopulations). However, this is a descriptive scRNAseq study, and function can therefore only theoretically be inferred from the algorithms used to analyze the data. A functional role for any of the identified pathways or processes can only be defined with gain- andor loss-of-function studies.

      5) C2 highly expressed biomineralization-related genes (Clec3a, Tnn, Acan). The three example genes are not related to biomineralization.

      6) The functional characterization of the three enthesis cell clusters is not convincing. For example, activation of metabolism-related processes can mean a lot of things (including changes in differentiation), yet the authors interpret it very specifically as role in postnatal fibrochondrocyte formation and growth.

      7) The pseudotime analysis of the enthesis cell clusters is not convincing. The three clusters are quite close and overlapping on the UMAP. Furthermore, the authors focus on Tnn as a novel and unique gene, yet the expression pattern shown in Fig 5g implies even expression of this gene across all three clusters.

      8) The TC1 markers (Ly6a, Dlk3, Clec3b) imply a non-tendon-specific cell population. Perhaps a tendon progenitor pool or an endothelial cell phenotype is more appropriate.

      9) Pseudotime analyses assume that your data set includes cells from progenitor through mature cell populations. It is unclear that the timepoints studied here included cells from early progenitor states.

      10) The CellChat analysis is difficult to follow, as the authors included 18 cell types. The number of possible interactions among so many cell types is enormous, and deducing valid connections between any two cell types in this case should be justified. Is the algorithm robust to so many possible interactions

      Thank you very much for your comments and suggestions. According to your suggestions, we carefully revised the paper. We integrated our dataset with open source GSE182997 datasets and re-performed the downstream analysis. On the other hand, we added immunofluorescence tests to validate the results came from single-cell datasets. And we hope all the mentioned issues in prior version to be well addressed.

      Reviewer #2 (Public Review)

      To reveals cellular and molecular heterogeneity in enthesis, the authors established a single-cell temporal atlas during development. This study provides a transcriptional resource for further investigation of fibrocartilage development.

      Thank you very much for your kind suggestions. According to your suggestions, we integrated our dataset with open source GSE182997 datasets and re-performed the downstream analysis. On the other hand, we added immunofluorescence tests to validate the results came from sinlge-cell datasets. And we hope the mentioned issues in prior version to be well addressed.

    1. Author Response

      eLife assesssment:

      This paper conducts human and rodent experiments of non-invasive diffusion MRI estimates of axon diameter with the aim to establish whether these estimates provide biologically specific markers of axonal degeneration in MS. It will be of interest to researchers developing quantitative MRI methods and scientists studying neurodegeneration. The experiments provide evidence for the sensitivity of these markers, but do not directly validate axon diameter and do not reflect common pathological mechanisms across rodents and humans.

      We thank the Editor for the appreciation of our work. Thanks to the addition of an extensive electron microscopy paradigm, we now include a direct validation of axonal damage and expand on the common pathological mechanisms across the two species. The new results are detailed in the manuscript and summarized in Fig. 3 in the manuscript

      Reviewer #1 (Public Review):

      1.1 My primary concern relates to how meaningful the human-rodent comparisons are, and whether these comparisons really advance our understanding of AxCaliber estimates in MS. I applaud the aim to conduct "matched" experiments in both rodent models and human disease. It is a strength that the experiments are aligned with respect to the MRI measurements (although there are some caveats to this mentioned below). But beyond that, the overlap is not what one might hope for: the pathology would seem to be very distinct in humans and rodents, and the histological validation is not specific to what the MRI measurements claim to estimate. To summarize the main findings: (i) in a rat model of general axonal degeneration, axon calibre estimates correlate with neurofilaments; (ii) in MS in humans, axon calibre estimates correlate with demyelinating lesions. This gives a picture of AxCalibre estimates correlating with neuropathology, but is this something that has not already been established in the literature? If the aim is to validate AxCaliber, then there is a logic in using a rodent model that isolates alterations to axonal radius, but what then does this add to the existing literature in that space? If the aim is to study MS (for which AxCaliber results have been previously reported in Huang et al), then why not use a rodent model of MS?

      We thank the reviewer for their very insightful comments. Indeed, multiple sclerosis (MS) is a chronic neuroinflammatory and neurodegenerative disease of unknown etiology. An enormous effort has been made to obtain animal models that simulate the pathogenesis of this disease. However, while several models exist recapitulating distinct aspects of the disease (mostly related to demyelination), MS fundamentally remains a disease that only affects humans. This does not mean that EAE or lysolecithin models do not provide information on specific aspects and are therefore valuable. In fact, we believe that trying to replicate the pathological mechanisms of this disease in an animal model goes beyond the scope of the present work. In this work, our intention is to validate a biomarker of axonal damage preclinically, and for this, we use a model of axonal degeneration. We do not claim that this model should be valid to capture the complex clinical and pathological manifestation of MS, but we do think that it is a necessary step to ensure MRI sensitivity to axonal pathology. Why necessary? Because all the available (very limited) MRI literature which provides some form of validation: i) only focuses on healthy tissue, and ii) has an n of 1. Our preclinical paradigm gives conclusive evidence that the MRI axonal diameter proxy detects axonal damage as an increase in the mean diameter. This is now detailed in the discussion.

      After this necessary preclinical validation, we then apply the same framework to a human disease like MS that, among other manifestations, is believed to also cause axonal pathology. The improvements with respect to the one published work about axonal diameter in MS are: i) the whole brain analysis, which allowed us to characterize the extent of these early alterations outside the demyelinated lesions; and ii) the larger sample size, which allowed us to uncover an association with disease duration, strengthening our hypothesis about increased axonal diameter being a marker of early disease (new Fig. 5).

      Regarding the nonspecificity of histological validation, we thank the reviewer for this insightful comment, which triggered an additional analysis that we believe has added further value to the paper. Using electron microscopy, we found that in our model of neurodegeneration, axonal damage is indeed reflected as an increase in axon diameter (new Fig. 3). These recent findings strongly support the validation of our noninvasive diffusion MRI estimates of axon diameter alterations as an early-stage hallmark of normal-appearing tissue in MS.

      Coming back to the comparison between pathology in humans and in rodents, the EM data also support our choice of preclinical model, showing axonal swelling, the same phenomenon reported and characterized in recent postmortem histological data in the normal-appearing white matter of MS patients (Luchicchi et al., Ann Neurol 2021) and in lesions (Fisher et al. Ann Neurol 2007).

      All in all, we are confident that the new data supports the validity of this translational approach, and shed new light into the degenerating aspect of MS.

      Changes in the manuscript

      • Discussion, pag.12: It is important to stress that the aim of this work is not to propose a new animal model of MS, a disease that only affects humans, but rather to validate axonal damage detection (independently from the pathology that has induced it) through noninvasive MRI and apply the framework to characterize axonal pathology in MS.

      1.2 I appreciate that both rodent and patient studies are time intensive, major endeavors. Neverthless, the number of subjects is very low in both rodent (n=9) and human (MS=10, control=6) studies. At the very least, this should be more openly acknowledged. But I'm concerned that this is a major weakness of the paper. Related to this, I find it hard to tell how carefully multiple comparison correction was performed throughout. It seems reasonably clear for the TBSS analyses, but then other analyses were performed in ROIs. Are these multiple comparisons corrected as well? Similarly, in Methods, I am confused by the statement that: "post hoc t tests corrected for multiple comparisons whenever a significant effect was detected". What does this mean?

      We thank the reviewer for this comment. We agree that a small sample size was a weakness of the previous version of the paper, and therefore, in the new version, we have substantially increased the n for both animal and human experiments (from n=9 to 19 in animals, from 16 to 21 in humans). We removed the ROI analysis in the new version, and thus the confusing statement, and clarified the strategy for multiple comparisons.

      Changes in the manuscript

      • Data analysis, pag. 18: Lesion masks were excluded from the statistical analysis, and multiple comparisons across clusters were controlled for by using threshold-free cluster enhancement.

      1.3 While I do not think the text is in any sense deliberately misleading, I think the authors would do well to either tone down their claims or consider more carefully the implications of the text in many places. Some that stuck out for me are:

      Throughout, language in the paper (e.g., "Paired t tests were used to assess differences in the axonal diameter") presumes that the AxCaliber estimates specifically reflect axon diameter. I think the jury is out over whether this is true, particularly for measurements conducted with limited hardware specs. At the very least, I would encourage the author to refer to these measurements throughout as "estimates" of axon diameter.

      Thank you for this clarification. We have indeed changed the notation, and now consistently refer to the estimates of axon diameter through MRI as the “MRI axonal diameter proxy”.

      1.4 The authors suggest that their results provide "new tools for patient stratification" based on differences in lesion type, but it isn't clear what new information these markers would confer given that the lesions are differentiated based on T1w hypo/hyperintensities. In other words, these lesions are by definition already differentiable from a much simpler MRI marker.

      Thank you for this insightful comment. The reviewer is right, and following the general reviewers’ assessment we have decided to not include the lesion analysis in the new version of the manuscript.

      1.5 The authors note in the Discussion that: "sensitive to early stages of axonal degeneration, even before alterations in the myelin sheet are detected". Whether intentional or not, the implication in the context of this study is that this would hold for MS (that these markers would detect axonal degeneration preceding demyelination). While there is some discussion of alterations to axonal diameter in MS, the authors do not discuss whether these are the same mechanisms thought to occur in the IBO intervention used here.

      Thank you for this comment. Indeed, the scope of the paper is not to assess whether axonal swelling precedes or not myelin alterations, so we agree with the reviewer that this sentence might be misleading and have removed it in the text. While we do not claim that ibotenic acid injections are able to replicate the complex clinical and pathological manifestation of MS (and now we made it clear in the revised manuscript, see comment 1), the electron microscopy paradigm indicates the presence of axonal swelling in the damaged fimbria, which is indeed the same pathological manifestation found in MS post-mortem data (see e.g. Fisher et al. Ann Neurol 2007).

      1.6 In the Discussion, the authors note the lack of evidence for a relationship with disability or disease duration, but nevertheless, go on to interpret the "trends" they do observe. I would advise strongly against this: the authors acknowledge that their numbers are low, so I would avoid the temptation to speculate here.

      The reviewer is 100% correct. We should have refrained from speculating. In the new version of the paper, however, thanks to the larger human cohort, we were able to find significant associations with disease duration in voxelwise analysis of the white matter skeleton in standard space and in the whole white matter in single subject space (new Figure 5).

      1.7 In the Discussion state that "the use of neurofilaments has also been well validated in MS". Well validated for what? MS is a complex disease with a broad range of pathology, so this statement could be read to mean "neurofilaments are known to be altered in MS". However, in the context of this paragraph, the implication would seem to be that neurofilaments are a wellestablished proxy for axonal diameter. Is that the implication, and if so what general evidence is there for this?

      We thank the reviewer for this insightful comment. Indeed, altered neurofilaments are not conclusive evidence of increased axonal diameter. In this context, the addition of electron microscopy data in the new manuscript version supports the claim.

      Reviewer #2 (Public Review):

      Diffusion MRI is sensitive to the brain microstructure, and it has been used to assess the integrity of white matter for nearly 3 decades. Its main limitation is the limited specificity, which makes it difficult to link changes in diffusion parameters to a given pathological substrate. Recently methods based on diffusion MRI that enable the estimation of axonal diameter, non invasively, have become available. This paper aims at validating one of such methods using an experimental model of neurodegeneration. The authors found a significant correlation between axonal diameter estimated by MRI and an histological marker of neurodegeneration. Although this is of great interest, as it demonstrates that this method is sensitive to neurodegeneration, a direct validation would require a measurement of axonal diameter using electron or confocal microscopy, rather than a correlation with a measure of axonal degeneration not directly related to axonal diameter. So, although these data are compelling, they do not prove that the increase in axonal diameter suggested by diffusion MRI corresponds to actual axonal swelling. The Authors also apply the same method to compare the white matter of patients with multiple sclerosis (MS) and healthy controls, showing widespread increases in axonal diameter in the patients. These data are compelling, but again, not conclusive. Other factors such as gloss could bias the MRI measurement and lead to an apparent increase in axonal diameter.

      We would like to thank the reviewer for the positive assessment of our work and for the valuable suggestion. We are confident that the new version of the manuscript, by including an extensive validation based on electron microscopy, has addressed the reviewer´s criticisms.

      Reviewer #3 (Public Review):

      3.1 In this paper, Toschi et al. performed dMRI to in vivo estimate axon diameter in the brain and demonstrated that multi-compartmental modeling (AxCaliber) is sensitive to microstructural axonal damage in rats and axon caliber increase in demyelinating lesions in MS patients, suggesting that axon diameter mapping provides a potential biomarker to bridge the gap between medical imaging contrasts and biological microstructure. In particular, authors injected ibotenic acid (IBO) and saline in the left and right rat hippocampus, respectively, and compared in vivo estimated axon diameter and ex vivo neurofilament staining in left and right fimbria. The axon size estimation was larger in the fimbria of IBO injection side, where the neurofilament intensity is higher. Correlation of axon size estimation and neurofilament intensity was observed in both injection sides. Further, higher axon diameter estimation was observed in normal appearing white matter (NAWM) of MS patients, compared with the healthy subjects. The axon size estimation increased in hypointense lesions of T1 weighted contrast, but not in isointense lesions. Through the comparison of dMRI-estimated axon size and histology-based fluorescence intensity, authors indirectly validated the sensitivity of axon diameter mapping to the tissue microstructure in the rat brain, and further explored the axon size change in the brain of MS patients. However, the dMRI protocol and biophysical modeling in this study were not fully optimized to maximize the sensitivity to axon size estimation, and the dMRI-estimated axon size (4.4-5.4 micron) was much larger than values reported in previous histological studies (0.5-3 micron) [Barazany et al., Brain 2009]. Finally, although the modified AxCaliber model incorporated two fiber bundles in different directions, the fiber dispersion in each bundle was not considered (c.f. fiber dispersion ~20-30 degree in corpus callosum), potentially leading to overestimated axon diameter.

      We thank the reviewer for their appreciation of our work, which we believe is substantially improved in this revised version through the inclusion of an electron microscopy paradigm. Below, the point-by-point response to the specific points raised.

      3.2 The conclusions in this study are supported by experimental results. However, the dMRI protocol and biophysical model could be further optimized and validated: 1. To in vivo estimate the axon diameter ~1 micron using dMRI, strong diffusion weighting (b-value) should be applied to maximize the signal decay due to intra-axonal restricted diffusion and minimize the signal contribution of extra-cellular hindered diffusion. However, authors only applied maximal b-value = 4000 s/mm2, much smaller than values ~15,00020,000 s/mm2 in previous studies [Assaf et al., MRM 2008; Huang et al., BSAF 2020, 225:1277]. The use of low diffusion weighting in this study leads to a lower bound ~4-6 micron for accurate diameter estimation, the so-called resolution limit in [Nilsson et al., NMR Biomed 2017, 30:e3711]. In other words, the estimated axon diameter is potentially overestimated and related with the imaging protocol and image quality, confounding the biological interpretation.

      We thank the reviewer for this insightful comment. Indeed, while the resolution limit is a concern, the chosen b-value has been a compromise between sensitivity to small structure and SNR, as indicated by recent animal (Crater et al., 2022) and human (Jensen et al., 2016; McKinnon et al., 2017; Moss et al., 2019) work, pointing at 3000-4000 s/mm2 as the b-value for which the intra-axonal water signal is dominant. In addition, a paper from the laboratory that first developed the Axcaliber method recently came out (Gast et al., 2023, DOI: 10.1007/s12021-023-09630-w) demonstrating that an MRI protocol with a maximum b-value between 3000 and 4000 s/mm2 (and even lower) is sufficient to capture, in vivo and in humans, various well-known aspects of axonal morphometry (e.g., the corpus callosum axon diameter variation) as well as other aspects that are less explored (e.g., axon diameter-based separation of the superior longitudinal fasciculus into segments). The same paper contains resources and further bibliography supporting the fact that experimental evidence suggests that the contribution of intra-axonal water to restricted diffusion signals dominates other factors (see Online Resource 1, section A of the same paper). To challenge this recent evidence from a neurobiology perspective, we include in the supplementary material a subset of experiments in animals with lower maximum b-value (2500 s/mm2, Fig. S1), where we are able to detect the same effect of increased MRI axonal diameter proxy in the injected hemisphere compared to control.

      We would like to add that while extremely valuable and informative, simulation studies such as the excellent study by Veraart et al., 2020, are inevitably valid under certain assumptions. Among them, some critical ones are i) the need to neglect nonaxonal cells such as glia, ii) assuming that the bulk diffusivity of water in cerebral tissue would be the same as that of free water, and iii) impermeable barriers. All these assumptions are expected to play a role in the estimated resolution limit, a role difficult to quantify but likely substantial.

      For this reason, we believe that our approach, which is 100% focused on neurobiology and measurements performed in real tissue, can offer a different perspective and fuel the ongoing debate on axonal diameter measurement feasibility. We acknowledge the value of the reviewer comment and discuss the issue of b-value in the discussion (see also comment 1.8).

      Changes in the manuscript

      • Discussion, pag. 12:<br /> Despite some inevitable minor differences due to different brain sizes and magnet features, the human protocol was built to match the main characteristics of the preclinical diffusion sequence, such as the b-value and diffusion time range. The chosen b-value has been a compromise between sensitivity to small structures and the signalto-noise ratio (SNR), as indicated by recent animal (Crater et al., 2022) and human (Gast et al., 2023; Jensen et al., 2016; McKinnon et al., 2017; Moss et al., 2019) work, pointing at 4000 s/mm2 as the b-value for which the intra-axonal water signal is dominant. However, following recent work supporting sensitivity of diffusion-weighted MRI to axonal diameter even at lower b-values (Gast et al., 2023), we tested a protocol with a lower b-value in a subset of animals, with the aim of facilitating future clinical AxCaliber studies. We found no qualitative differences in the outcome (MRI axonal diameter proxy was increased following fimbria damage). Further work and perhaps more realistic simulations, considering real cell composition and morphology, are needed to clarify this issue.

      3.3 In this study, the positive correlation of dMRI-estimated axon size and neurofilament fluorescence intensity is indeed an encouraging result, and yet this validation is indirect since it relies on the positive correlation between neurofilament intensity and axon diameter in histology.

      The reviewer correctly points out a severe limitation of the previous manuscript version, which is now addressed by including an extensive electron microscopy evaluation, recapitulated in new Fig. 3.

      3.4 Authors did not consider the fiber dispersion in the proposed dMRI model. This can lead to overestimated axon diameter, even in the highly aligned WM, such as corpus callosum with ~20-30 degree dispersion in histology [Ronen et al., BSAF 2014, 219:1773; Leergaard et all, PLoS One 2010, 5(1), e8595] and MRI [Dhital et al., NeuroImage 2019, 189, 543; Novikov et al., NeuroImage 2018, 174:518].

      The reviewer is correctly pointing out an important characteristic of while matter microstructure as is fibre dispersion. However, we would like to point out that the use of a second fiber population is expected to mitigate this effect by absorbing some axonal directional dispersion in areas of a single fiber. To support this, we quantified dispersion as the angle between the two main fiber orientations captured by the AxCaliber fit, as showed in Author response image 1 for two representative subjects (one control, upper line, and one MS, lower line; the “dispersion” maps are masked by a white matter probability mask, and superimposed to a T2w). Indeed, the angle between the two main fibres in the corpus callosum is around 20 degrees or lower, compatible with the bibliography cited by the reviewer, and higher in other white matter areas known to be characterized by fiber crossing and dispersion.

      Author response image 1.

      Angle in radians between the two main fiber orientations captured by the AxCaliber fit, as showed below for two representative subjects (one control, upper line, and one MS, lower line). The dispersion maps are masked by a white matter probability mask (P>=0.95), and superimposed to a T2-weighted image.

    1. Author Response

      Reviewer #1 (Public Review):

      This study examines the factors underlying the assembly of MreB, an actin family member involved in mediating longitudinal cell wall synthesis in rod-shaped bacteria. Required for maintaining rod shape and essential for growth in model bacteria, single molecule work indicates that MreB forms treadmilling polymers that guide the synthesis of new peptidoglycan along the longitudinal cell wall. MreB has proven difficult to work with and the field is littered with artifacts. In vitro analysis of MreB assembly dynamics has not fared much better as helpfully detailed in the introduction to this study. In contrast to its distant relative actin, MreB is difficult to purify and requires very specific conditions to polymerize that differ between groups of bacteria. Currently, in vitro analysis of MreB and related proteins has been mostly limited to MreBs from Gram-negative bacteria which have different properties and behaviors from related proteins in Gram-positive organisms.

      Here, Mao and colleagues use a range of techniques to purify MreB from the Gram-positive organism Geobacillus stearothermophilus, identify factors required for its assembly, and analyze the structure of MreB polymers. Notably, they identify two short hydrophobic sequences-located near one another on the 3-D structure-which are required to mediate membrane anchoring.

      With regard to assembly dynamics, the authors find that Geobacillus MreB assembly requires both interactions with membrane lipids and nucleotide binding. Nucleotide hydrolysis is required for interaction with the membrane and interaction with lipids triggers polymerization. These experiments appear to be conducted in a rigorous manner, although the salt concentration of the buffer (500mM KCl) is quite high relative to that used for in vitro analysis of MreBs from other organisms. The authors should elaborate on their decision to use such a high salt buffer, and ideally, provide insight into how it might impact their findings relative to previous work.

      Response 1.1. MreB proteins are notoriously difficult to maintain in a soluble form. Some labs deleted the N-terminal amphipathic or hydrophobic sequences to increase solubility, while other labs used full-length protein but high KCl concentration (300 mM KCl) (Harne et al, 2020; Pande et al., 2022; Popp et al, 2010; Szatmari et al, 2020). Early in the project, we tested many conditions and noticed that high KCl helped keeping a slightly better solubility of full length MreBGs, without the need for deleting a part of the protein. In addition, concentrations of salt > 100 mM would better mimic the conditions met by the protein in vivo. While 50-100 mM KCl is traditionally used in actin polymerization assays, physiological salt concentrations are around 100-150 mM KCl in invertebrates and vertebrates (Schmidt-Nielsen, 1975), around 50-250 in fungal and plant cells (Rodriguez-Navarro, 2000) and 200-300 mM in the budding yeast (Arino et al, 2010). However, cytoplasmic K+ concentration varies greatly (up to 800 mM) depending on the osmolality of the medium in both E. coli (Cayley et al, 1991; Epstein & Schultz, 1965; Rhoads et al, 1976), and B. subtilis, in which the basal intracellular concentration of KCl was estimated to be ~ 350 mM (Eisenstadt, 1972; Whatmore et al, 1990). 500 mM KCl can therefore be considered as physiological as 100 mM KCl for bacterial cells. Since we observed plenty of pairs of protofilaments at 500 mM KCl and this condition helped to avoid aggregation, we kept this high concentration as a standard for most of our experiments. Nonetheless, we had also performed TEM polymerization assays at 100 mM in line with most of MreB and F-actin in vitro literature, and found no difference in the polymerization (or absence of polymerization) conditions. This was indicated in the initial submission (e.g. M&M section L540 and footnote of Table S2) but since two reviewers bring it up as a main point, it is evident we failed at communicating it clearly, for which we apologize. This has been clarified in the revised version of the manuscript. We have also almost systematically added the 100 mM KCl concentration too as per reviewer #2 request and to conciliate our salt conditions with those used for some in vitro analysis of MreBs from other organisms (see also response to reviewer #2 comments 1A and 1B = Responses 2.1A, 2.1B below). We then decided to refer to the 100 mM KCl concentration as our “standard condition” in the revised version of the manuscript, but we compile and compare the results obtained at 500 mM too, as both concentrations are within the physiological range in Bacillus.

      Additionally, this study, like many others on MreB, makes much of MreB's relationship to actin. This leads to confusion and the use of unhelpful comparisons. For example, MreB filaments are not actin-like (line 58) any more than any polymer is "actin-like." As evidenced by the very beautiful images in this manuscript, MreB forms straight protofilaments that assemble into parallel arrays, not the paired-twisted polymers that are characteristic of F-actin. Generally, I would argue that work on MreB has been hindered by rather than benefitted from its relationship to actin (E.g early FP fusion data interpreted as evidence for an MreB endoskeleton supporting cell shape or depletion experiments implicating MreB in chromosome segregation) and thus such comparisons should be avoided unless absolutely necessary.

      Response 1.2. We completely agree with reviewer #1 regarding unhelpful comparisons of actin and MreB, and that work on MreB has been traditionally hindered from its relationship to eukaryotic actin. MreB is nonetheless a structural homolog of actin, with a close structural fold and common properties (polymerization into pairs of protofilaments, ATPase activity…). It still makes sense to refer to a protein with common features, common ancestry and widely studied as long as we don’t enclose our mind into a conceptual framework. This said, actin and MreB diverged very early in evolution, which may account for differences in their biochemical properties and cellular functions. Current data on MreB filaments confirm that they display F-actin-like and F-actin-unlike properties. We thank the reviewer for this insightful comment. We have revised the text to remove any inaccurate or unhelpful comparison to actin (in particular the ‘actin-like filaments’ statement, previously used once)

      Reviewer #2 (Public Review):

      The paper "Polymerization cycle of actin homolog MreB from a Gram-positive bacterium" by Mao et al. provides the second biochemical study of a gram-positive MreB, but importantly, the first study examines how gram-positive MreB filaments bind to membranes. They also show the first crystal structure of a MreB from a Gram-positive bacterium - in two nucleotide-bound forms, finally solving structures that have been missing for too long. They also elucidate what residues in Geobacillus MreB are required for membrane associations. Also, the QCM-D approach to monitoring MreB membrane associations is a direct and elegant assay.

      While the above findings are novel and important, this paper also makes a series of conclusions that run counter to multiple in vitro studies of MreBs from different organisms and other polymers with the actin fold. Overall, they propose that Geobacillus MreB contains biochemical properties that are quite different than not only the other MreBs examined so far but also eukaryotic actin and every actin homolog that has been characterized in vitro. As the conclusions proposed here would place the biochemical properties of Geobacillus MreB as the sole exception to all other actin fold polymers, further supporting experiments are needed to bolster these contrasting conclusions and their overall model.

      Response 2.0. We are grateful to reviewer #2 for stressing out the novelty and importance of our results. Most of our conclusions were in line with previous in vitro studies of MreBs (formation of pairs of straight filaments on a lipid layer, both ATP and GTP binding and hydrolysis, distortion of liposomes…), to the exception of the claimed requirement of NTP hydrolysis for membrane binding prior to polymerization based on the absence of pairs of filaments in free solution or in the presence of AMP-PNP in our experimental conditions (which we agree was not sufficient to make such a bold claim, see below). Thanks to the reviewer’s comments, we have performed many controls and additional experiments that lead us to refine our results and largely conciliate them with the literature. Please see the answer to the global review comments - our conclusions have been revised on the basis of our new data.

      1. (Difference 1) - The predominant concern about the in vitro studies that makes it difficult to evaluate many of their results (much less compare them to other MreB/s and actin homologs) is the use of a highly unconventional polymerization buffer containing 500(!) mM KCL. As has been demonstrated with actin and other polymers, the high KCl concentration used here (500mM) is certain to affect the polymerization equilibria, as increasing salt increases the hydrophobic effect and inhibits salt bridges, and therefore will affect the affinity between monomers and filaments. For example, past work has shown that high salt greatly changes actin polymerization, causing: a decreased critical concentration, increased bundling, and a greatly increased filament stiffness (Kang et al., 2013, 2012). Similarly, with AlfA, increased salt concentrations have been shown to increase the critical concentration, decrease the polymerization kinetics, and inhibit the bundling of AlfA filaments (Polka et al., 2009).

      A more closely related example comes from the previous observation that increasing salt concentrations increasingly slow the polymerization kinetics of B. subtilis MreB (Mayer and Amann, 2009). Lastly, These high salt concentrations might also change the interactions of MreB(Gs) with the membrane by screening charges and/or increasing the hydrophobic effect. Given that 500mM KCl was used throughout this paper, many (if not all) of the key experiments should be repeated in more standard salt concentration (~100mM), similar to those used in most previous in vitro studies of polymers.

      Response 2.1A. As per reviewer #2 request, we have done at 100 mM KCl too most experiments (TEM, cryo-EM, QCMD and ATPase assays) initially performed at 500 mM KCl only. The KCl concentration affects both membrane binding and filament stiffness as anticipated by the reviewer but the main conclusions are the same. The revised version of the manuscript compiles and compares the results obtained at both high and low [KCl], both concentrations being within the physiological range in Bacillus. Please see point 1 of the response to the global review comments and the first response to reviewer 1 (Response 1.1) for further elaboration.

      Please note that in Mayer & Amann, 2009 (B. subtilis MreB), light scattering in free solution was inversely proportional to the KCl concentration, with the higher light scattering signal at 0 mM KCl (!), a > 2-fold reduction below 30 mM KCl and no scatter at all at 250 mM, suggesting a “salting in” phenomenon (see also the “Other Points to address” answers 1A and 2, below) (Mayer & Amann, 2009). Since no effective polymer formation (e.g. polymers shown by EM) was demonstrated in these experiments, it cannot be excluded that KCl was simply preventing aggregation of B. subtilis MreB in solution, as we observe. For all their other light scattering experiments, the ‘standard polymerization condition’ used by Mayer & Amann was 0.2 mM ATP, 5 mM MgCl2, 1 mM EGTA and 10 mM imidazole pH 7.0, to which MreB (in 5 mM Tris pH 8.0) was added. No KCl was present in their ‘standard’ polymerization conditions.

      This would test if the many divergent properties of MreB(Gs) reported here arise from some difference in MreB(Gs) relative to other MreBs (and actin homologs), or if they arise from the 400mM difference in salt concentration between the studies. Critically, it would also allow direct comparisons to be made relative to previous studies of MreB (and other actin homologs) that used much lower salt, thereby allowing them to definitively demonstrate whether MreB(Gs) is indeed an outlier relative to other MreB and actin homologs. I would suggest using 100mM KCL, as historically, all polymerization assays of actin and numerous actin homologs have used 50-100mM KCL: 50mM KCl (for actin in F buffer) or 100mM KCl for multiple prokaryotic actin homologs and MreB (Deng et al., 2016; Ent et al., 2014; Esue et al., 2006, 2005; Garner et al., 2004 ; Polka et al., 2009 ; Rivera et al., 2011 ; Salje et al., 2011). Likewise, similar salt concentrations are standard for tubulin (80 mM K-Pipes) and FtsZ (100 mM KCl or 100mM KAc in HMK100 buffer).

      Response 2.1B. We appreciate the reviewer’s feedback on this point. Please note that, although actin polymerization assays are historically performed at 50-100 mM KCl and thus 100 mM KCl was used for other bacterial actin homologs (MamK, ParM and AlfA), MreB polymerization assays have previously been reported at 300 mM KCl too (Harne et al., 2020; Pande et al., 2022; Popp et al., 2010; Szatmari et al., 2020), which is closer to the physiological salt concentration in bacterial cells (see Response 1.1), but also in the absence of KCl (see above). As a matter of fact, we originally wanted to use a “standard polymerization condition” based on the literature on MreB, before realizing there was none: only half used KCl (the other half used NaCl, or no monovalent salt at all) and among these, KCl concentrations varied (out of 8 publications, 2 used 20 mM KCl, 2 used 50 mM KCl and 4 used 300 mM KCl).

      1. (Difference 2) - One of the most important differences claimed in this paper is that MreB(Gs) filaments are straight, a result that runs counter to the curved T. Maritima and C. crescentus filaments detailed by the Löwe group (Ent et al., 2014; Salje et al., 2011). Importantly, this difference could also arise from the difference in salt concentrations used in each study (500mM here vs. 100mM in the Löwe studies), and thus one cannot currently draw any direct comparisons between the two studies.

      One example of how high salt could be causing differences in filament geometry: high salts are known to greatly increase the bending stiffness of actin filaments, making them more rigid (Kang et al., 2013). Likewise, increasing salt is known to change the rigidity of membranes. As the ability of filaments to A) bend the membrane or B) Deform to the membrane depends on the stiffness of filaments relative to the stiffness of the membrane, the observed difference in the "straight vs. curved" conformation of MreB filaments might simply arise from different salt concentrations. Thus, in order to draw several direct comparisons between their findings and those of other MreB orthologs (as done here), the studies of MreB(GS) confirmations on lipids should be repeated at the same buffer conditions as used in the Löwe papers, then allowing them to be directly compared.

      Response 2.2. We fully agreed with reviewer #2 that the salts could be affecting the assay and did cryo-EM experiments also in the presence of 100 mM KCl as requested. The results unambiguously showed countless curved liposomes on the contact areas with MreB (Fig. 2F-G and Fig. 2-S5), very similar to what was reported for Thermotoga and Caulobacter MreBs by the Lowe group. Our results therefore confirm the previous findings that MreBs can bend lipids, and suggest that, indeed, high salt may increase filament stiffness as it has been shown for actin filaments. We are very grateful to reviewer #2 for his suggestion and for drawing our attention to the work of Kang et al, 2013. The different bending observed when varying the salt concentration raise relevant questions regarding the in vivo behavior of MreB, since KCl was shown to vary greatly depending on the medium composition. The manuscript has been updated accordingly in the Results (from L243) and Discussion sections (L585-595).

      1. (Difference 3) - The next important difference between MreB(Gs) and other MreBs is the claim that MreB polymers do not form in the absence of membranes.

      A) This is surprising relative to other MreBs, as MreBs from 1) T. maritime (multiple studies), E.coli (Nurse and Marians, 2013), and C. crescentus (Ent et al., 2014) have been shown to form polymers in solution (without lipids) with electron microscopy, light scattering, and time-resolved multi-angle light scattering. Notably, the Esue work was able to observe the first phase of polymer formation and a subsequent phase of polymer bundling (Esue et al., 2006) of MreB in solution. 2) Similarly, (Mayer and Amann, 2009) demonstrated B. subtilis MreB forms polymers in the absence of membranes using light scattering.

      Response 2.3A. The literature does convincingly show that Thermotoga MreB forms polymers in solution, without lipids (note that for Caulobacter MreB filaments were only reported in the presence of lipids, (van den Ent et al, 2014)). Assemblies reported in solution are bundles or sheets (included in at the earlier time points in the time-resolved EM experiments reported by Esue et al. 2006 mentioned by the reviewer – ‘2 minutes after adding ATP, EM revealed that MreB formed short filamentous bundles’) (Esue et al, 2006). However, and as discussed above (Response 2.1A), the light scattering experiments in Mayer et Amann, 2009 do not conclusively demonstrate the presence of polymers of B. subtilis MreB in solution (Mayer & Amann, 2009). We performed many light scattering experiments of B. subtilis MreB in solution in the past (before finding out that filaments were only forming in the presence of lipids), and got similar scattering curves (see two examples of DLS experiments in Author response image 1) in conditions in which NO polymers could ever been observed by EM while plenty of aggregates were present.

      Author response image 1.

      We did not consider these results publishable in the absence of true polymers observed by TEM. As pointed out on the interesting study from Nurse et al. (on E. coli MreB) (Nurse & Marians, 2013), one cannot rely only on light scattering only because non-specific aggregates would show similar patterns than polymers. Over the last two decades, about 15 publications showed polymers of MreB from several Gram-negative species, while none (despite the efforts of many) showed a single convincing MreB polymer from a Gram-positive bacterium by EM. A simple hypothesis is that a critical parameter was missing, and we present convincing evidence that lipids are critical for Geobacillus MreB to form pairs of filaments in the conditions tested. However, in solution too we do occasionally see pairs of filaments (Fig 2-S2), and also sheet-like structures among aggregates when the concentration of MreB is increased (Fig. 2-S2 and Fig. 3-S2). Thus, we agree with the reviewer that it cannot be claimed that Geobacillus MreB is unable to polymerize in the absence of lipids, but rather that lipids strongly stimulate its polymerization, condition depending.

      B) The results shown in figure 5A also go against this conclusion, as there is only a 2-fold increase in the phosphate release from MreB(Gs) in the presence of membranes relative to the absence of membranes. Thus, if their model is correct, and MreB(Gs) polymers form only on membranes, this would require the unpolymerized MreB monomers to hydrolyze ATP at 1/2 the rate of MreB in filaments. This high relative rate of hydrolysis of monomers compared to filaments is unprecedented. For all polymers examined so far, the rate of monomer hydrolysis is several orders of magnitude less than that of the filament. For example, actin monomers are known to hydrolyze ATP 430,000X slower than the monomers inside filaments (Blanchoin and Pollard, 2002; Rould et al., 2006).

      Response 2.3B. We agree with the reviewer. We have now found conditions where sheets of MreB form in solution (at high MreB concentration) in the presence of ADP and AMP-PNP. However, we have now added several controls that exclude efficient formation of polymers in solution in the presence of ATP at low concentrations of MreBGs (≤ 1.5 µM), the condition used for the malachite green assays. At these MreB concentrations, pairs of filaments are observed in the presence of lipids, but very unfrequently in solution, and sheets are not observed in solution either (Fig. 2-S2A, B). Yet, albeit puzzling, in these conditions Pi release is reproducibly observed in solution, reduced only ~ 2 to 3-fold relative to Pi release in the presence of lipids (Fig. 5A and Fig. 5-S1). A reinforcing observation is when the ATPase assays is performed at 100 mM KCl (Fig. 5A). In this condition MreB binding to lipids is increased relative to 500 mM KCl (Fig. 4-S4C), and the stimulation of the ATPase activity by the presence of lipids is also stronger that at 500 mM (Fig. 5-S1A). Further work is needed to characterize in detail the ATPase activity of MreB proteins, for which data in the literature is very scarce. We can’t exclude that MreB could nucleate in solution or form very unstable filaments that cannot be seen in our EM assay but consume ATP in the process. At the moment, the significance of the Pi released in solution is unknown and will require further investigation.

      C) Thus, there is a strong possibility that MreB(Gs) polymers are indeed forming in solution in addition to those on the membrane, and these "solution polymers" may not be captured by their electron microscopy assay. For example, high salt could be interfering with the absorption of filaments to glow discharged lacking lipids.

      Response 2.3C. We appreciate the reviewer’s insight about this critical point. Polymers presented in the original Fig. 2A were obtained at 500 mM KCl but we had tested the polymerization of MreB at 100 mM KCl as well, without noticing differences. We have nonetheless redone this quantitatively and used these data for the revised Fig. 2A, as we are now using 100 mM KCl as our standard polymerization condition throughout the revised manuscript. We also followed the other suggestion of the reviewer and tested glow discharged grids (a more classic preparation for soluble proteins) vs non-glow discharged EM grids, as well as a higher concentration of MreB. Grids are generally glow-discharged to make them hydrophilic in order to adsorb soluble proteins, but the properties of MreB (soluble but obviously presenting hydrophobic domains) made difficult to predict what support putative soluble polymers would preferentially interact with. Septins for example bind much better to hydrophobic grids despite their soluble properties (I. Adriaans, personal communication). Virtually no double filaments were observed in solution at either low or high [MreB]. The fact that in some conditions (high [MreB], other nucleotides) we were able to detect sheet-like structures excluded a technical issue that would prevent the detection of existing but “invisible” polymers here. We have added these new data in Fig. 2-S2.

      As indicated above, the reviewer’s comments made us realize that we could not state or imply that MreB cannot polymerize in the absence of lipids. As a matter of fact, we always saw some random filaments in the EM fields, both in solution and in the presence of non-hydrolysable analogues, at very low frequency (Fig. 2A). And we do see now sheets at high MreB concentration (Fig. 2-S2B). We could be just missing the optimal conditions for polymerisation in solution, while our phrasing gave the impression that no polymers could ever form in the absence of ATP or lipids. Therefore, we have:

      1) analyzed all TEM data to present it as semi-quantitative TEM, using our methodology originally implemented for the analysis of the mutants

      2) reworked the text to remove any issuing statements and to indicate that MreBGs was only found to bind to a lipid monolayer as a double protofilament in the presence of ATP/GTP but that this does not exclude that filaments may also form in other conditions.

      In order to definitively prove that MreB(Gs) does not have polymers in solution, the authors should:

      i) conduct orthogonal experiments to test for polymers in solution. The simplest test of polymerization might be conducting pelleting assays of MreB(Gs) with and without lipids, sweeping through the concentration range as done in 2B and 5a.

      Response 2.3Ci. Following reviewer #2 suggestion, we conducted a series of sedimentation assays in the presence and in the absence of lipids, at low (100 mM) and high (500 mM) salt, for both the wild-type protein and the three membrane-anchoring mutants (all at 1.3 µM). Sedimentation experiments in salt conditions preventing aggregation in solution (500 mM KCl) fitted with our TEM results: MreB wild-type pelleting increased in the presence of both ATP and lipids (Fig. R1). The sedimentation was further increased at 100 mM KCl, which would fit our other results indicating an increased interaction of MreB with the membrane. However, in addition to be poorly reproducible (in our hands), the approach does not discriminate between polymers and aggregates (or monomers bound to liposomes) and since MreB has a strong tendency to aggregate, we believe that the technique is ill-suited to reliably address MreB polymerization and prefer not to include sedimentation data in our manuscript. The recent work from Pande et al. (2022) illustrates well this issue since no sedimentation of MreB (at 2 µM) was observed in solution in conditions supporting polymerization (at 300 mM KCl): ‘the protein does not pellet on its own in the absence of liposome, irrespective of its polymerization state’, implying that sedimentation does not allow to detect MreB5 filaments in solution (Pande et al., 2022).

      ii) They also could examine if they see MreB filaments in the absence of lipids at 100mM salt (as was seen in both Löwe studies), as the high salt used here might block the charges on glow discharged grids, making it difficult for the polymer to adhere.

      See above, Response 2.3C

      iii) Likewise, the claim that MreB lacking the amino-terminus and the α2β7 hydrophobic loop "is required for polymerization" is questionable as if deleting these resides blocks membrane binding, the lack of polymers on the membrane on the grid is not unexpected, as these filaments that cannot bind the membrane would not be observable. Given these mutants cannot bind the membrane, mutant polymers could still indeed exist in solution, and thus pelleting assays should be used to test if non-membrane associated filaments composed of these mutants do or do not exist.

      Response 2.3Ciii. This is a fair point, we thank the reviewer for this remark. We did not mean to state or imply that the hydrophobic loop was required for polymerization per se, but that polymerization into double filaments only efficiently occurs upon membrane binding, which is mediated by the two hydrophobic sequences. We tested all three mutants by sedimentation as suggested by reviewer #2. In the salt condition that limits aggregation (500 mM KCl) the mutants did not pellet while the wild-type protein did (in the presence of lipids) (Fig. R2 below), in agreement with our EM data. We tested the absence of lipids on the mutant bearing the 2 deletions and observed that the (partial) sedimentation observed at low KCl concentration was ATP and lipid dependent (Fig. R3).

      Given our concerns about MreB sedimentation assays (see above, Response 2.3Ci), we prefer not to include these sedimentation data in our manuscript. Instead, we tested by TEM the possible polymerization of the mutants in solution (we only tested them in the presence of lipids in the initial submission). No filaments were detected in solution for any of the mutants (Fig. 4-S3A).

      A final note, the results shown in "Figure 1 - figure supplement 2, panel C" appear to directly refute the claim that MreB(Gs) requires lipids to polymerize. As currently written, it appears they can observe MreB(Gs) filaments on EM grids without lipids. If these experiments were done in the presence of lipids, the figure legend should be updated to indicate that. If these experiments were done in the absence of lipids, the claim that membrane association is required for MreB polymerizations should be revised.

      The TEM experiments show were indeed performed in the presence of lipids. We apologize for this was not clearly stated in the legend. To prevent all confusion, we have nevertheless removed these images in this figure since the polymerization conditions and lipid requirement are not yet presented when this figure is referred to in the text. We have instead added a panel with the calibration curve for the size exclusion profiles as per request of reviewer #3. The main point of this figure is to show the tendency of MreBGs to aggregate: analytical size-exclusion chromatography shows a single peak corresponding to the monomeric MreBGs, molecular weight ~ 37 KDa, in our purification conditions, but it can readily shift to a peak corresponding to high MW aggregates, depending on the protein concentration and/or storage conditions.

      1. (Difference 4) - The next difference between this study and previous studies of MreB and actin homologs is the conclusion that MreB(Gs) must hydrolyze ATP in order to polymerize. This conclusion is surprising, given the fact that both T. Maritima (Salje · 2011, Bean 2008) and B. subtilis MreB (Mayer 2009) have been shown to polymerize in the presence of ATP as well as AMP-PNP.

      Likewise, MreB polymerization has been shown to lag ATP hydrolysis in not only T. maritima MreB (Esue 2005), eukaryotic actin, and all other prokaryotic actin homologs whose polymerization and phosphate release have been directly compared: MamK (Deng et al., 2016), AlfA (Polka et al., 2009), and two divergent ParM homologs (Garner et al., 2004; Rivera et al., 2011). Currently, the only piece of evidence supporting the idea that MreB(Gs) must hydrolyze ATP in order to polymerize comes from 2 observations: 1) using electron microscopy, they cannot see filaments of MreB(Gs) on membranes in the presence of AMP-PNP or ApCpp, and 2) no appreciable signal increase appears testing AMPPNP- MreB(Gs) using QCM-D. This evidence is by no means conclusive enough to support this bold claim: While their competition experiment does indicate AMPPNP binds to MreB(Gs), it is possible that MreB(Gs) cannot polymerize when bound to AMPPNP.

      For example, it has been shown that different actin homologs respond differently to different non-hydrolysable analogs: Some, like actin, can hydrolyze one ATP analog but not the other, while others are able to bind to many different ATP analogs but only polymerize with some of one of them.

      Response 2.4. We agree with the reviewer, it is uncertain what analogs bind because they are quite different to ATP and some proteins just do not like them, they can change conditions such that filaments stop forming as well and be (theoretically) misleading. This is why we had tested ApCpp in addition to AMP-PNP as non-hydrolysable analog (Fig. 3A). As indicated above, our new complementary experiments (Fig. 3-S1B-D) now show that some rare (i.e. unfrequently and in limited amount) dual polymers are detected in the presence of ApCpp (Fig. 3A) and at high MreB concentration only in the presence of AMP-PNP (Fig. 3-S1B-D), suggesting different critical concentrations in the presence of alternative nucleotides. We have dampened our conclusions, in the light of our new data, and modified the discussion accordingly.

      Thus, to further verify their "hydrolysis is needed for polymerization" conclusion, they should:

      A. Test if a hydrolysis deficient MreB(Gs) mutant (such as D158A) is also unable to polymerize by EM.

      Response 2.4A. We thank the reviewer for this suggestion. As this conclusion has been reviewed on the basis of our new data (see previous response), testing putative ATPase deficient mutants is no longer required here. The study of ATPase mutants is planned for future studies (see Response 3.10 to reviewer #3).

      B. They also should conduct an orthogonal assay of MreB polymerization aside from EM (pelleting assays might be the easiest). They should test if polymers of ATP, AMP-PNP, and MreB(Gs)(D158A) form in solution (without membranes) by conducting pelleting assays. These could also be conducted with and without lipids, thereby also addressing the points noted above in point 3.

      Response 2.4B. Please see Response 2.3Ci above.

      C. Polymers may indeed form with ATP-gamma-S, and this non-hydrolysable ATP analog should be tested.

      Response 2.4C. It is fairly possible that ATP-γ-S supports polymerization since it is known to be partially hydrolysable by actin giving a mild phenotype (Mannherz et al, 1975). This molecule can even be a bona fide substrate for some ATPases (e.g. (Peck & Herschlag, 2003). Thus, we decided to exclude this “non-hydrolysable” analog and tested instead AMP-PNP and ApCpp. We know that ATP-γ-S has been and it is still frequently used, but we preferred to avoid it for the moment for the above-indicated reasons. We chose AMPPNP and AMPPCP instead because (1) they were shown to be completely non-hydrolysable by actin, in contrast to ATP-γ-S; (2) they are widely used (the most commonly used for structural studies; (Lacabanne et al, 2020), (3) AMPPNP was previously used in several publications on MreB (Bean & Amann, 2008; Nurse & Marians, 2013; Pande et al., 2022; Popp et al., 2010; Salje et al, 2011; van den Ent et al., 2014)and thus would allow direct comparison. AMPPCP was added to confirm the finding with AMP-PNP. There are many other analogs that we are planning to explore in future studies (see next Response, 2.4D).

      D. They could also test how the ADP-Phosphate bound MreB(Gs) polymerizes in bulk and on membranes, using beryllium phosphate to trap MreB in the ADP-Pi state. This might allow them to further refine their model.

      Response 2.4D. We plan to address the question of the transition state in depth in following-up work, using a series of analogs and mutants presumably affected in ATPase activity, both predicted and identified in a genetic screen. As indicated above, it is uncertain what analogs bind because they are quite different to ATP and some may bind but prevent filament formation. Thus, we anticipate that trying just one may not be sufficient, they can change conditions and be (theoretically) misleading and thus a thorough analysis is needed to address this question. Since our model and conclusions have been revised on the basis of our new data, we believe that these experiments are beyond the scope of the current manuscript.

      E. Importantly, the Mayer study of B. subtilis MreB found the same results in regard to nucleotides, "In polymerization buffer, MreB produced phosphate in the presence of ATP and GTP, but not in ADP, AMP, GDP or AMP-PNP, or without the readdition of any nucleotide". Thus this paper should be referenced and discussed

      Response 2.4E. We agree that Pi release was detected previously. We have added the reference (L121)

      1. (Difference 5) - The introduction states (lines 128-130) "However, the need for nucleotide binding and hydrolysis in polymerization remains unclear due to conflicting results, in vivo and in vitro, including the ability of MreB to polymerize or not in the presence of ADP or the non-hydrolysable ATP analog AMP-PNP."

      A) While this is a great way to introduce the problem, the statement is a bit vague and should be clarified, detaining the conflicting results and appropriate references. For example, what conflicting in vivo results are they referring to? Regarding "MreB polymerization in AMP-PNP", multiple groups have shown the polymerization of MreB(Tm) in the presence of AMP-PNP, but it is not clear what papers found opposing results.

      Response 2.5A. Thanks for the comment. We originally did not detail these ‘conflicting results’ in the Introduction because we were doing it later in the text, with the appropriate references, in particular in the Discussion (former L433-442). We have now removed this from the Discussion section and added a sentence in the introduction too (L123-130) quickly detailing the discrepancies and giving the references.

      • For more clarity, we have removed the “in vivo” (which referred to the distinct results reported for the presumed ATPase mutants by the Garner and Graumann groups) and focus on the in vitro discrepancies only.

      • These discrepancies are the following: while some studies showed indeed polymerization (as assessed by EM) of MreBTm in the presence of AMPPNP, the studies from Popp et al and Esue et al on T. maritima MreB, and of Nurse et al on E. coli MreB reported aggregation in the presence of AMP-PNP (Esue et al., 2006; Popp et al., 2010) or ADP (Nurse & Marians, 2013), or no assembly in the presence of ADP (Esue et al., 2006). As for the studies reporting polymerization in the presence of AMP-PNP by light scattering only (Bean & Amann, 2008; Gaballah et al, 2011; Mayer & Amann, 2009; Nurse & Marians, 2013), they could not differentiate between aggregates or true polymers and thus cannot be considered conclusive.

      B) The statement "However, the need for nucleotide binding and hydrolysis in polymerization remains unclear due to conflicting results, in vivo and in vitro, including the ability of MreB to polymerize or not in the presence of ADP or the non-hydrolyzable ATP analog AMP-PNP" is technically incorrect and should be rephrased or further tested.

      i. For all actin (or tubulin) family proteins, it is not that a given filament "cannot polymerize" in the presence of ADP but rather that the ADP-bound form has a higher critical concentration for polymer formation relative to the ATP-bound form. This means that the ADP polymers can indeed polymerize, but only when the total protein exceeds the ADP critical concentration. For example, many actin-family proteins do indeed polymerize in ADP: ADP actin has a 10-fold higher critical concentration than ATP actin, (Pollard, 1984) and the ADP critical concentrations of AlfA and ParM are 5X and 50X fold higher (respectively) than their ATP-bound forms(Garner et al., 2004; Polka et al., 2009)

      Response 2.5Bi. Absolutely correct. We apologize for the lack of accuracy of our phrasing and have corrected it (L123).

      ii. Likewise, (Mayer and Amann, 2009) have already demonstrated that B. subtilis MreB can polymerize in the presence of ADP, with a slightly higher critical concentration relative to the ATP-bound form.

      Response 2.5Bii. In Mayer and Amann, 2009, the same light scattering signal (interpreted as polymerization) occurred regardless of the nucleotide, and also in the absence of nucleotide (their Fig. 10) and ATP-, ADP- and AMP-PNP-MreB ‘displayed nearly indistinguishable critical concentrations’. They concluded that MreB polymerization is nucleotide-independent. Please see below (responses to ’Other points to address’) our extensive answer to the Mayer & Amann recurring point of reviewer #2

      Thus, to prove that MreB(Gs) polymers do not form in the presence of ADP would require one to test a large concentration range of ADP-bound MreB(Gs). They should test if ADP- MreB(Gs) polymerizes at the highest MreB(Gs) concentrations that can be assayed. Even if this fails, it may be the MreB(Gs) ADP polymerizes at higher concentrations than is possible with their protein preps (13uM). An even more simple fix would be to simply state MreB(Gs)-ADP filaments do not form beneath a given MreB(Gs) concentration.

      We agree with the reviewer. Our wording was overstating our conclusions. Based on our new quantifications (Fig. 3-S1B, D), we have rephrased the results section and now indicate that pairs of filaments are occasionally observed in the presence of ADP in our conditions across the range of MreB concentration that could be tested, suggesting a higher critical concentration for MreB-ADP (L310-312). Only at the highest MreB concentration, sheet- and ribbon-like structures were observed in the presence of ADP (Fig. 3-S2B).

      Other Points to address:

      1) There are several points in this paper where the work by Mayer and Amann is ignored, not cited, or readily dismissed as "hampered by aggregation" without any explanation or supporting evidence of that fact.

      We have cited the Mayer study where appropriate. However, we cannot cite it as proof of polymerization in such or such condition since their approach does not show that polymers were obtained in their conditions. Again, they based all their conclusions solely on light scattering experiments, which cannot differentiate between polymers and aggregates.

      A) Lines 100-101 - While the irregular 3-D formations seen formed by MreB in the Dersch 2020 paper could be interpreted as aggregates, stating that the results from specifically the Gaballah and Meyer papers (and not others) were "hampered by aggregation" is currently an arbitrary statement, with no evidence or backing provided. Overall, these lines (and others in the paper) dismiss these two works without giving any evidence to that point. Thus, they should provide evidence for why they believe all these papers are aggregation, or remove these (and other) dismissive statements.

      We apologize if our statements about these reports seemed dismissive or disrespectful, it was definitely not our intention. Light scattering shows an increase of size of particles over time, but there is no way to tell if the scattering is due to organized (polymers) or disorganized (aggregation) assemblies. Thus, it cannot be considered a conclusive evidence of polymerization without the proof that true filaments are formed by the protein in the conditions tested, as confirmed by EM for example. MreB is known to easily aggregate (see our size exclusion chromatography profiles and ones from Dersch 2020 (Dersch et al, 2020), and note that no chromatography profiles were shown in the Mayer report) and, as indicated above, we had similar light scattering results for MreB for years, while only aggregates could be observed by TEM (see above Response 2.3A). Several observations also suggest that aggregation instead of polymerization might be at play in the Mayer study, for example ‘polymerization’ occurring in salt-less buffer but ‘inhibited’ with as low as 100 mM KCl, which should rather be “salting in” (see below). We did not intend to be dismissive, but it seemed wrong to report their conclusions as conclusive evidence. We thought that we had cited these papers where appropriate but then explained that they show no conclusive proof of polymerization and why, but it is evident that we failed at communicating it clearly. We have reworked the text to remove any issuing and arbitrary statement about our concerns regarding these reports (e.g. L93 & L126).

      One important note - There are 2 points indicating that dismissing the Meyer and Amann work as aggregation is incorrect:

      1) the Meyer work on B. subtilis MreB shows both an ATP and a slightly higher ADP critical concentration. As the emergence of a critical concentration is a steady-state phenomenon arising from the association/dissociation of monomers (and a kinetically limiting nucleation barrier), an emergent critical concentration cannot arise from protein aggregation, critical concentrations only arise from a dynamic equilibrium between monomer and polymer.

      • Critical concentration for ATP, ADP or AMPPNP were described in Mayer & Amann (Mayer & Amann, 2009) as “nearly indistinguishable” (see Response 2.5Bii)
      • Protein aggregation depends on the solution (pH and ions), protein concentration and temperature. And above a certain concentration, proteins can become instable, thus a critical concentration for aggregation can emerge.

      2) Furthermore, Meyer observed that increased salt slowed and reduced B. subtilis MreB light scattering, the opposite of what one would expect if their "polymerization signal" was only protein aggregation, as higher salts should increase the rate of aggregation by increasing the hydrophobic effect.

      It is true that at high salt concentration proteins can precipitate, a phenomenon described as “salting out”. However, it is also true that salts help to solubilize proteins (“salting in”), and that proteins tend to precipitate in the absence of salt. Considering that the starting point of the Mayer and Amann experiment (Mayer & Amann, 2009) is the absence of salt (where they observed the highest scattering) and that they gradually reduce this scattering by increasing KCl (the scattering is almost abolished below 100 mM only!) it is plausible that a salting-in phenomenon might be at play, due to increased solubility of MreB by salt. In any case, this cannot be taken as a proof that polymerization rather than aggregation occurred.

      B) Lines 113-137 -The authors reference many different studies of MreB, including both MreB on membranes and MreB polymerized in solution (which formed bundles). However, they again neglect to mention or reference the findings of Meyer and Amann (Mayer and Amann, 2009), as it was dismissed as "aggregation". As B. subtilis is also a gram-positive organism, the Meyer results should be discussed.

      We did cite the Mayer and Amann paper but, as explained above, we cannot cite this study as an example of proven polymerization. We avoided as much as possible to polemicize in the text and cited this paper when possible. Again, we have reworked the text to avoid any issuing or dismissive statement. Also, we forgot mentioned this study at L121 as an example of reported ATPase activity, and this has now been corrected.

      2) Lines 387-391 state the rates of phosphate release relative to past MreB findings: "These rates of Pi release upon ATP hydrolysis (~ 1 Pi/MreB in 6 min at 53{degree sign}C) are comparable to those observed for MreBTm and MreB(Ec) in vitro". While the measurements of Pi release AND ATP hydrolysis have indeed been measured for actin, this statement does not apply to MreB and should be corrected: All MreB papers thus far have only measured Pi release alone, not ATP hydrolysis at the same time. Thus, it is inaccurate to state "rates of Pi release upon ATP hydrolysis" for any MreB study, as to accurately determine the rate of Pi release, one must measure: 1. The rate of polymer over time, 2) the rate of ATP hydrolysis, and 3) the rate of phosphate release. For MreB, no one has, so far, even measured the rates of ATP hydrolysis and phosphate release with the same sample.

      We completely agree with the reviewer, we apologize if our formulation was inaccurate. We have corrected the sentence (L479). Thank you for pointing out this mistake.

      3) The interpretation of the interactions between monomers in the MreB crystal should be more carefully stated to avoid confusion. While likely not their intention, the discussions of the crystal packing contacts of MreB can appear to assume that the monomer-monomer contacts they see in crystals represent the contacts within actual protofilaments. One cannot automatically assume the observations of monomer-monomer contacts within a crystal reflect those that arise in the actual filament (or protofilament).

      We agree, we thank the reviewer for his comments. We have revamped the corresponding paragraph.

      A) They state, "the apo form of MreBGs forms less stable protofilaments than its G- homologs ." Given filaments of the Apo form of MreB(GS) or b. subtilis have never been observed in solution, this statement is not accurate: while the contacts in the crystal may change with and without nucleotide, if the protein does not form polymers in solution in the apo state, then there are no "real" apo protofilaments, and any statements about their stability become moot. Thus this statement should be rephrased or appropriately qualified.

      see above.

      B) Another example: while they may see that in the apo MreB crystal, the loop of domain IB makes a single salt bridge with IIA and none with IIB. This contrasts with every actin, MreB, and actin homolog studied so far, where domain IB interacts with IIB. This might reflect the real contacts of MreB(Gs) in the solution, or it may be simply a crystal-packing artifact. Thus, the authors should be careful in their claims, making it clear to the reader that the contacts in the crystal may not necessarily be present in polymerized filaments.

      Again, we agree with the reviewer, we cannot draw general conclusions about the interactions between monomers from the apo form. We have rephrased this paragraph.

      4) lines 201-202 - "Polymers were only observed at a concentration of MreB above 0.55 μM (0.02 mg/mL)". Given this concentration dependence of filament formation, which appears the same throughout the paper, the authors could state that 0.55 μM is the critical concentration of MreB on membranes under their buffer conditions. Given the lack of critical concentration measurement in most of the MreB literature, this could be an important point to make in the field.

      Following reviewer’s #2 suggestion, we have now estimated the critical concentration (Cc=0.4485 µM) and reported it in the text. (L218).

      5) Both mg/ml and uM are used in the text and figures to refer to protein concentration. They should stick to one convention, preferably uM, as is standard in the polymer field.

      Sorry for the confusion. We have homogenized to MreB concentrations to µM throughout the text and figures.

      6) Lines 77-78 - (Teeffelen et al., 2011) should be referenced as well in regard to cell wall synthesis driving MreB motion.

      This has been corrected, sorry for omitting this reference.

      7) Line 90 - "Do they exhibit turnover (treadmill) like actin filaments?". This phrase should be modified, as turnover and treadmilling are two very different things. Turnover is the lifetime of monomers in filaments, while treadmilling entails monomer addition at one end and loss at the other. While treadmilling filaments cause turnover, there are also numerous examples of non-treadmilling filaments undergoing turnover: microtubules, intermediate filaments, and ParM. Likewise, an antiparallel filament cannot directionally treadmill, as there is no difference between the two filament ends to confer directional polarity.

      This is absolutely true, we apologize for our mistake. The sentence has been corrected (L82).

      8) Throughout the paper, the term aggregation is used occasionally to describe the polymerization shown in many previous MreB studies, almost all of which very clearly showed "bundled" filaments, very distinct entities from aggregates, as a bundle of polymers cannot form without the filaments first polymerizing on their own. Evidence to this point, polymerization has been shown to precede the bundling of MreB(Tm) by (Esue et al., 2005).

      We agree with reviewer #2 about polymers preceding bundles and “sheets”. However, we respectfully disagree that we used the word aggregation “throughout the paper” to describe structures that clearly showed polymers or sheets of filaments. A search (Ctrl-F: “aggreg”) reveals only 6 matches, 3 describing our own observations (L152, 163/5, and 1023/28), one referring to (Salje et al., 2011) (L107) but citing her claim that they observed aggregation (due to the N-terminus), and the last two (L100, L440) refer (again) to the Gaballah/Mayer/Dersch publications to say that aggregation could not be excluded in these reports as discussed above (Dersch et al., 2020; Gaballah et al., 2011; Mayer & Amann, 2009).

      9) lines 106-108 mention that "The N-terminal amphipathic helix of E. coli MreB (MreBEc) was found to be necessary for membrane binding. " This is not accurate, as Salje observed that one single helix could not cause MreB to mind to the membrane, but rather, multiple amphipathic helices were required for membrane association (Salje et al., 2011).

      Salje et al showed that in vivo the deletion of the helix abolishes the association of MreB to the membrane. This publication also shows that in vitro, addition of the helix to GFP (not to MreB) prompts binding to lipid vesicles, and that this was increased if there are 2 copies of the helix, but they could not test this directly in vitro with MreB (which is insoluble when expressed with its N-terminus). This prompted them to speculate that multiple MreBs could bind better to the membrane than monomers. However, this remained to be demonstrated. Additional hydrophobic regions in MreB such as the hydrophobic loop could participate to membrane anchoring but are absent in their in vitro assays with GFP.

      The Salje results imply that dimers (or further assemblies) of MreB drive membrane association, a point that should be discussed in regard to the question "What prompts the assembly of MreB on the inner leaflet of the cytoplasmic membrane?" posed on lines 86-87.

      We agree that this is an interesting point. As it is consistent with our results, we have incorporated it to our model (Fig. 6) and we are addressing it in the discussion L573-575.

      10) On lines 414-415, it is stated, "The requirement of the membrane for polymerization is consistent with the observation that MreB polymeric assemblies in vivo are membrane-associated only." While I agree with this hypothesis, it must be noted that the presence or absence of MreB polymers in the cytoplasm has not been directly tested, as short filaments in the cytoplasm would diffuse very quickly, requiring very short exposures (<5ms) to resolve them relative to their rate of diffusion. Thus, cytoplasmic polymers might still exist but have not been tested.

      This is also an interesting point. Indeed if a nucleated form, or very short (unbundled) polymers exist in the cytoplasm, they have not been tested by fluorescence microscopy. However, the polymers that localize at the membrane (~ 200 nm), if soluble, would have been detected in the cytoplasm by the work of reviewer #2, us or others.

      11) lines 429-431 state, "but polymerization in the presence of ADP was in most cases concluded from light scattering experiments alone, so the possibility that aggregation rather than ordered polymerization occurred in the process cannot be excluded."

      A) If an increased light scattering signal is initiated by the addition of ADP (or any nucleotide), that signal must come from polymerization or multimerization. What the authors imply is that there must be some ADP-dependent "aggregation" of MreB, which has not been seen thus far for any polymer. Furthermore, why would the addition of ADP initiate aggregation?

      We did not mean that ADP itself would prompt aggregation, but that the protein would aggregate in the buffer regardless of the presence of ADP or other nucleotides. The Mayer & Amann study claims that MreB “polymerization” is nucleotide-independent, as they got identical curves with ATP, ADP, AMPPNP and even with no nucleotides at all (Fig. 10 in their paper, pasted here) (Mayer & Amann, 2009).

      Their experiments with KCl are also remarkable as when they lowered the salt they got faster and faster “polymerization”, with the strongest light scattering signal in the absence of any salt. The high KCl concentration in which they got almost no more “polymers” was 75 mM KCl, and ‘polymerization was almost entirely inhibited at 100 mM’ (Fig. 7, pasted below). Yet the intracellular level of KCl in bacteria is estimated to be ~300 mM (see Response 1.1)

      B) Likewise, the statement "Differences in the purity of the nucleotide stocks used in these studies could also explain some of the discrepancies" is unexplained and confusing. How could an impurity in a nucleotide stock affect the past MreB results, and what is the precedent for this claim?

      We meant that the presence of ATP in the ADP stocks might have affected the outcome of some assays, generating the conflicting results existing in the literature. We agree this sentence was confusing, we have removed it.

      12) lines 467-469 state, "Thus, for both MreB and actin, despite hydrolyzing ATP before and after polymerization, respectively, the ADP-Pi-MreB intermediate would be the long-lived intermediate state within the filaments."

      A) For MreB, this statement is extremely speculative and unbiased, as no one has measured 1) polymerization, 2) ATP hydrolysis, and 3) phosphate release. For example, it could be that ATP hydrolysis is slow, while phosphate release is fast, as is seen in the actin from Saccharomyces cerevisiae.

      We agree that this was too speculative. This has been removed from the (extensively) modified Discussion section. Thanks for the comment.

      B) For actin, the statement of hydrolysis of ATP of monomer occurring "before polymerization" is functionally irrelevant, as the rate of ATP hydrolysis of actin monomers is 430,000 times slower than that of actin monomers inside filaments (Blanchoin and Pollard, 2002; Rould et al., 2006).

      We agree that the difference of hydrolysis rate between G-actin and F-actin implies that ATP hydrolysis occurs after polymerization. We are afraid that we do not follow the reviewer’s point here, we did not say or imply that ATP hydrolysis by actin monomers was functionally relevant.

      13) Lines 442-444. "On the basis of our data and the existing literature, we propose that the requirement for ATP (or GTP) hydrolysis for polymerization may be conserved for most MreBs." Again, this statement both here (and in the prior text) is an extremely bold claim, one that runs contrary to a large amount of past work on not just MreB, but also eukaryotic actin and every actin homolog studied so far. They come to this model based on 1) one piece of suggestive data (the behavior of MreB(GS) bound to 2 non-hydrolysable ATP analogs in 500mM KCL), and 2) the dismissal (throughout the paper) of many peer-reviewed MreB papers that run counter to their model as "aggregation" or "contaminated ATP stocks ." If they want to make this bold claim that their finding invalidates the work of many labs, they must back it up with further validating experiments.

      We respectfully disagree that our model was based on “one piece of suggestive data” and backed-up by dismissing most past work in the field. We only wanted to raise awareness about the conflicting data between some reports (listed in response 2.5a), and that the claims made by some publications are to be taken with caution because they only rely on light scattering or, when TEM was performed, showed only disorganized structures.

      This said, we clearly failed in proposing our model and we are sorry to see that we really annoyed the reviewer with our suspicion that the work by Mayer & Amann reports aggregation. As indicated above, we have amended our manuscript relative to this point. We also agree that our suggestion to generalize our findings to most MreBs was unsupported, and overstated considering how confusing some result from the literature are. We have refined our model and reworked the text to take on board the reviewer’s remarks as well as the new data generated during the revision process.

      We would like to thank reviewer #2 for his in-depth review of our manuscript.  

      Reviewer #3 (Public Review):

      The major claim from the paper is the dependence of two factors that determine the polymerization of MreB from a Gram-positive, thermophilic bacteria 1) The role of nucleotide hydrolysis in driving the polymerization. 2) Lipid bilayer as a facilitator/scaffold that is required for hydrolysis-dependent polymerization. These two conclusions are contrasting with what has been known until now for the MreB proteins that have been characterized in vitro. The experiments performed in the paper do not completely justify these claims as elaborated below.

      We understand the reviewer’ concerns in view of the existing literature on actin and Gram-negative MreBs. We may just be missing the optimal conditions for polymerisation in solution, while our phrasing gave the impression that polymers could never form in the absence of ATP or lipids. Our new data actually shows that MreBGs at higher concentration can assemble into bundle- and sheet-like structures in solution and in the presence of ADP/AMP-PNP. Pairs of filaments are however only observed in the presence of lipids for all conditions tested. As indicated in the answers to the global review comments, we have included our new data in the manuscript, revised our conclusions and claims about the lipid requirement and expanded on these points in the Discussion.

      Major comments:

      1) No observation of filaments in the absence of lipid monolayer can also be accounted due to the higher critical concentration of polymerization for MreBGS in that condition. It is seen that all the negative staining without lipid monolayer condition has been performed at a concentration of 0.05 mg/mL. It is important to check for polymerization of the MreBGS at higher concentration ranges as well, in order to conclusively state the requirement of lipids for polymerization.

      Response 3.1. 0.05 mg/ml (1.3µM) is our standard condition, and our leeway was limited by the rapid aggregation observed at higher MreB concentrations, as indicated in the text. We have now tested as well 0.25 mg/ml (6.5 µM - the maximum concentration possible before major aggregation occurs in our experimental conditions). At this higher concentration, we see some sheet-like structures in solution, confirming a requirement of a higher concentration of MreB for polymerization in these conditions (see the answers to the global review comments for more details)

      We thank the reviewer for pushing us to address this point. We have revised our conclusions accordingly.

      2) The absence of filaments for the non-hydrolysable conditions in the lipid layer could also be because the filaments that might have formed are not binding to the planar lipid layer, and not necessarily because of their inability to polymerize.

      Response 3.2. This is a fair point. To test the possibility that polymers would form but would not bind to the lipid layer we have now added additional semi-quantitative EM controls (for both the non-hydrolysable ATP analogs and the three ‘membrane binding’ deletion mutants) testing polymerization in solution (without lipids) and also using plasma-treated grids. These showed that in our standard polymerization conditions, virtually no polymers form in solution (Fig. 3-S1B and Fig. 4-S4A). Albeit at very low frequency, some dual protofilaments were however detected in the presence of ADP or AMP-PNP at the high MreB concentration (Fig. 3-S1D). At this high MreB concentration, the sheet-like structures occasionally observed in solution in the presence of ATP were frequent in the presence of ADP and very frequent in the presence of AMP-PNP (Fig. 3-S2B). We have revised our conclusions on the basis of these new data: MreBGs can form polymeric assemblies in solution and in the absence of ATP hydrolysis at a higher critical concentration than in the presence of ATP and lipids.

      See the answers to the global review comments (point 2) and Response 2.3C to reviewer #2 for more details.

      3) Given the ATPase activity measurements, it is not very convincing that ATP rather than ADP will be present in the structure. The ATP should have been hydrolysed to ADP within the structure. The structure is now suggestive that MreB is not capable of hydrolysis, which is contradictory to the ATP hydrolysis data.

      Response 3.3. We thank the reviewer for her insightful remarks about the MreB-ATP crystal structure. The electron density map clearly demonstrates the presence of 3 phosphates. However, as suggested by the reviewer, the density which was attributed to a Mg2+ ion was to be interpreted as a water molecule. The absence of Mg2+ in the crystal could thus explain why the ATP had not been hydrolyzed.

      References

      Arino J, Ramos J, Sychrova H (2010) Alkali metal cation transport and homeostasis in yeasts. Microbiology and molecular biology reviews 74: 95-120

      Bean GJ, Amann KJ (2008) Polymerization properties of the Thermotoga maritima actin MreB: roles of temperature, nucleotides, and ions. Biochemistry 47: 826-835

      Cayley S, Lewis BA, Guttman HJ, Record MT, Jr. (1991) Characterization of the cytoplasm of Escherichia coli K-12 as a function of external osmolarity. Implications for protein-DNA interactions in vivo. Journal of molecular biology 222: 281-300

      Dersch S, Reimold C, Stoll J, Breddermann H, Heimerl T, Defeu Soufo HJ, Graumann PL (2020) Polymerization of Bacillus subtilis MreB on a lipid membrane reveals lateral co-polymerization of MreB paralogs and strong effects of cations on filament formation. BMC Mol Cell Biol 21: 76

      Eisenstadt E (1972) Potassium content during growth and sporulation in Bacillus subtilis. Journal of bacteriology 112: 264-267

      Epstein W, Schultz SG (1965) Cation Transport in Escherichia coli: V. Regulation of cation content. J Gen Physiol 49: 221-234

      Esue O, Wirtz D, Tseng Y (2006) GTPase activity, structure, and mechanical properties of filaments assembled from bacterial cytoskeleton protein MreB. Journal of bacteriology 188: 968-976

      Gaballah A, Kloeckner A, Otten C, Sahl HG, Henrichfreise B (2011) Functional analysis of the cytoskeleton protein MreB from Chlamydophila pneumoniae. PloS one 6: e25129

      Harne S, Duret S, Pande V, Bapat M, Beven L, Gayathri P (2020) MreB5 Is a Determinant of Rod-to-Helical Transition in the Cell-Wall-less Bacterium Spiroplasma. Curr Biol 30: 4753-4762 e4757

      Kang H, Bradley MJ, McCullough BR, Pierre A, Grintsevich EE, Reisler E, De La Cruz EM (2012) Identification of cation-binding sites on actin that drive polymerization and modulate bending stiffness. Proceedings of the National Academy of Sciences of the United States of America 109: 16923-16927

      Lacabanne D, Wiegand T, Wili N, Kozlova MI, Cadalbert R, Klose D, Mulkidjanian AY, Meier BH, Bockmann A (2020) ATP Analogues for Structural Investigations: Case Studies of a DnaB Helicase and an ABC Transporter. Molecules 25

      Mannherz HG, Brehme H, Lamp U (1975) Depolymerisation of F-actin to G-actin and its repolymerisation in the presence of analogs of adenosine triphosphate. Eur J Biochem 60: 109-116

      Mayer JA, Amann KJ (2009) Assembly properties of the Bacillus subtilis actin, MreB. Cell motility and the cytoskeleton 66: 109-118

      Nurse P, Marians KJ (2013) Purification and characterization of Escherichia coli MreB protein. The Journal of biological chemistry 288: 3469-3475

      Pande V, Mitra N, Bagde SR, Srinivasan R, Gayathri P (2022) Filament organization of the bacterial actin MreB is dependent on the nucleotide state. The Journal of cell biology 221

      Peck ML, Herschlag D (2003) Adenosine 5 '-O-(3-thio)triphosphate (ATP-gamma S) is a substrate for the nucleotide hydrolysis and RNA unwinding activities of eukaryotic translation initiation factor eIF4A. Rna 9: 1180-1187

      Popp D, Narita A, Maeda K, Fujisawa T, Ghoshdastider U, Iwasa M, Maeda Y, Robinson RC (2010) Filament structure, organization, and dynamics in MreB sheets. The Journal of biological chemistry 285: 15858-15865

      Rhoads DB, Waters FB, Epstein W (1976) Cation transport in Escherichia coli. VIII. Potassium transport mutants. J Gen Physiol 67: 325-341

      Rodriguez-Navarro A (2000) Potassium transport in fungi and plants. Biochimica et biophysica acta 1469: 1-30

      Salje J, van den Ent F, de Boer P, Lowe J (2011) Direct membrane binding by bacterial actin MreB. Molecular cell 43: 478-487

      Schmidt-Nielsen B (1975) Comparative physiology of cellular ion and volume regulation. J Exp Zool 194: 207-219

      Szatmari D, Sarkany P, Kocsis B, Nagy T, Miseta A, Barko S, Longauer B, Robinson RC, Nyitrai M (2020) Intracellular ion concentrations and cation-dependent remodelling of bacterial MreB assemblies. Sci Rep-Uk 10

      van den Ent F, Izore T, Bharat TA, Johnson CM, Lowe J (2014) Bacterial actin MreB forms antiparallel double filaments. eLife 3: e02634

      Whatmore AM, Chudek JA, Reed RH (1990) The Effects of Osmotic Upshock on the Intracellular Solute Pools of Bacillus subtilis. Journal of general microbiology 136: 2527-2535

    1. Author Response

      Reviewer #2 (Public Review):

      The authors present findings on a designed peptide, PITCR, and its role in inhibiting TCR activation through an extensive series of experiments. These include the measurement of phosphorylation in the TCR zeta chain and a number of associated signaling proteins such as Zap70, LAT, PLCg1, and SLP76. In addition, the authors measure the impact of PITCR on the TCR intracellular calcium response and examine the peptide-induced inhibition of TCR activation by antigen-presenting cells. They also present data indicating that the fluorescently labeled PITCR co-localizes with TCR in Jurkat cells and with ligand-bound TCR in primary murine cells. Overall the experiments provide useful insights into the mechanism of T cell activation and generally support an allosteric model of activation, while not necessarily excluding alternative models.

      However, some aspects of the study do need clarification.

      1) The authors do not provide a clear structural basis for their peptide design, which makes it difficult to understand the rationale for choosing this particular peptide. The use of a structural model based on the TCR zeta domain, for example, and how it becomes modified to generate PITCR would provide some clarity on what types of putative interactions are being engineered.

      We thank the reviewer for giving us a chance to elaborate. We have expanded the results section to provide more information on the peptide design, where we now point out that the acidic residues in the TCR TM allow peptide design. We have also applied the artificial intelligence program AlphaFold-Multimer (AlFoM) to generate a structural model of the docking site of PITCR in the TCR (Figure 9), which informs on new mechanistic insights, as we describe in the updated results section and discuss below.

      2) The inhibitory effects of PITCR are not large. Measurement of dose dependence might improve confidence in the results.

      As the reviewer points out, we have performed an extensive set of experiments to assess the inhibitory effect of PITCR. We have demonstrated that PITCR inhibits TCR phosphorylation. We have also tested all proximal signaling proteins: Zap70, LAT, SLP76, and PLC gamma. Critically, in all cases a statistically significant inhibition is observed. Furthermore, inhibition was additionally seen when TCR was activated by peptide presentation in antigen-presenting cells. Interaction between PITCR and the receptor is supported by co-localization, co-IP and the new AlphaFold-Multimer prediction. We are therefore confident in the results presented and that the inhibitory effect indeed exists. As we responded to reviewer 1 above, we discuss that inconsistent results were obtained with lower PITCR concentrations, suggesting that the use of a high peptide concentration is required for robust inhibition.

      3) Use of control peptides is not uniform. Control peptides similar to PITCR in Figure 1 and Figure 2 studies, for example, could strengthen the authors' arguments.

      The original version of the manuscript contained two negative control peptides, the G41P mutant of PITCR, and pHLIP, another pH-responsive peptide which behaves as a conditional transmembrane peptide. However, for feasibility reasons we did not use all the negative controls in all different experiments, as we were satisfied when a negative control peptide acted as such in an experiment. However, because we agree that increased use of negative control peptides will strengthen the manuscript, we have expanded the use of negative control peptides. Specifically, the updated version of the manuscript contains a new section where AlFoM is used to predict the binding pose of PITCR and the structural consequences of interaction (see Figure 9 and the four new supplementary figures). AlFoM showed that PITCR binds with a large interaction interface, and peptide binding causes a large rearrangement of the two zeta chains in TCR. Importantly, neither of the two original negative control peptides (PITCRG41P or pHLIP) impacts the zeta chains. When we used a new negative control, the conditional transmembrane peptide TYPE7 developed by us, AlFoM did not predict it to bind to TCR, as expected, strengthening our argument.

      Reviewer #3 (Public Review):

      The use of pH-responsive TM-targeting peptides, which the authors previously developed, is a novel aspect of this study. Those peptides can be quite powerful for understanding molecular mechanisms of receptor signaling, such as the allosteric activation model as tested in this study. The manuscript contains several interesting approaches and observations, but there are concerns about the experimental design and interpretation of the results. More importantly, the authors' primary conclusion that the allosteric changes in the TM bundles determine TCR activation is not fully supported by the data presented. For example:

      1) The authors provided confocal fluorescence images showing the colocalization of fluorescently labeled peptides and TCR subunits. Based on the data, they concluded that "PITCR is able to bind to TCR". This is misleading, because given the spatial resolution of the imaging technique, "colocalization" does not indicate binding or interaction between molecules. Because the peptide binding to the TM region is the pillar of the primary finding of this study, direct evidence supporting the peptide-TM binding or interaction is essential.

      We have to disagree that our statement is misleading: the section of the manuscript that the reviewer referred to, said “suggesting that PITCR is able to bind to TCR before it is activated by OKT3“. Therefore, we were not making a conclusion, just a mere suggestion, that we consider is justified, particularly as it is supported by data presented later. Nevertheless, we certainly agree with the reviewer that co-localization experiments fundamentally cannot indicate binding. We have modified the results (page 11) to follow the suggestion of the reviewer and indicate that co-localization data are not proof of interaction. In addition, we provide new AlphaFold multimer data, which supports that transmembrane binding indeed occurs.

      2) In calcium response experiments, the authors compared calcium influx (indicated by Indo-1 ratio) under different cell activation conditions (Figure 2). There are some concerns about how the authors interpreted the data: (1) The calcium plots from OKT3 activation in A-C panels are inconsistent. The plot in (A) showed a calcium peak after activation, which is not present in the plots shown in (B) and (C). There is no explanation or discussion on this inconsistency. (2) What is more concerning is that this prominent calcium peak in (A) was used to draw the conclusion that the designer peptide inhibitor effectively reduces calcium response. However, inconsistent with that conclusion, the calcium plots are indistinguishable for the three conditions: with PITCR (peptide inhibitor), with PITCRG41P (negative control that should not affect TCR activation), or no peptide. All three plots have similar magnetite and fluctuations. This does not support the authors' conclusion that the PITCR (peptide inhibitor) reduces calcium response in T cells.

      We thank the reviewer for this comment. We have updated figure 3, which now contains a different replicate of the calcium assay, which we think it is more straightforward to analyze, and more clearly shows the calcium inhibition, as quantified in panel D of the figure.

      3) Different types of T cells were used for separate measurements: E6-1 Jurkat T cells were used for calcium influx experiments, J. OT.hCD8+ Jurkat cells were used for CD69 measurements, and primary murine CD4+ T cells were used for colocalization imaging experiments. Rationales for the choices of cells in different measurements are also unclear. This is different from the common practice where different cell types are used in repeated experiments to test the generality of a finding. Here, they were used for different experiments, and findings were lumped together as "T cells", without further evidence/discussion on how translatable the findings from different cell types are.

      As the reviewer suggests, we have updated the manuscript to include discussion on the particularities of the use of the different T cells in pages 18 and 19. We envisioned this work as a proof of principle for the design of a peptide that can eventually be modified to be used for pre-clinical applications, and this paper is a first step. With this idea in mind, we wanted to test if this peptide can work in different types of TCR since: (1) TCR populations are diverse; and (2) our design is based on the transmembrane domain of CD3zeta chain, which is largely conserved among species. Using different types of T cells met this goal since they have different types of TCR, but the transmembrane domain of CD3zeta is conserved. In our paper, we used human Jurkat-TCR, OT1-TCR coupled with hCD8, and murine CD4-TCR. In addition, we not only used one activation marker to test the peptide’s inhibitory effect, we used three: phosphorylation, calcium influx, and CD69 activation. For the co-localization experiment, we not only use murine CD4 T cells, but we also tested it in Jurkat T cells with/without OKT3 stimulation as well.

      We selected these T cells because they were particularly suited for the breath of different measurements that this manuscript contains, based on published reports. In our opinion this approach broadens the relevance of the work.

      4) The authors set out to test the model that TCR activation by pMHC occurs through allosteric changes in the TM region, but in most experiments, they activated Jurkat T cells by anti-CD3 antibody, not by antigen peptides. The anti-CD3 antibody activates TCR signaling through clustering. It is unclear whether TCR activation by anti-CD3 leads to the same allosteric changes in the TM region as activation by pMHC. As such, the main claim of the paper, namely that the designer peptide affects TCR signaling by disrupting the allosteric changes in the TM region, remains insufficiently supported by the data presented.

      Figure 8 shows that the levels of co-IP in the presence of detergent are altered by OKT3 activation of TCR. It has recently been established (PMID: 34260912) that this assay allows the investigation of allosteric changes that contribute to activation of TCR. This evidence is supportive of allosterism in TCR activation. Additionally, the TCR proximal signaling is conserved between the Jurkat T cells activated by OKT3 and TCR activated by pMHC. We can reasonably argue that the peptide acts similarly in both conditions, since the peptide also exerts an inhibitory effect in T cells activated by antigen-presenting cells (Figure 4). The newly presented AlFoM model (Figure 9) predicts that PITCR binding displaces a zeta chain in TCR. This new result provides a plausible molecular rationale for the results in Figure 8, where we observe that PITCR changes transmembrane compactness, which has been linked to allosteric activation (Lanz et al., 2021; Prakaash et al., 2021).

    1. Author Response

      eLife assessment

      This useful paper examines changes (or lack thereof) in birds' fear response to humans as a result of COVID-19 lockdowns. The evidence supporting the primary conclusion is currently inadequate, because the model used does not properly account for many potentially confounding factors that could influence the study's outcomes. If the analytic approach were improved, the findings would be of interest to urban ecologists, behavioral biologists and ecologists, and researchers interested in understanding the effects of COVID-19 lockdowns on animals.

      Many thanks for these supportive words. We did our best to improve our manuscript according to the reviewers and editor comments. Importantly, we regret being unclear in the Methods, as our models already controlled for most of the confounds (see below) discussed by the reviewers.

      For example, given that a single observer collected the data at most sites, site as a random intercept in the models controls also for the observer effects (which is one of the reasons why site is in the model). We added details to Methods (L352-356, see also “Statistical analyses” in the main text).

      The first reviewer asked us to use “some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here”. Our main results are now based on country-specific models and hence, the use of a single value predictor for each city is not appropriate. Please, see also below.

      The second reviewer is concerned about multicollinearity in our models because of the 0.95 correlation between Period and Stringency Index. However, these are key predictor variables of interest that have never been used within the same model as predictors. We now clearly explain this in the Methods (L458-538, 548-550) and within legend of Figure S2.

      The third reviewer suggested that our models would benefit from controlling for day in the species-specific breeding cycle. Although we don’t have precise city-specific information on the timing of breeding stages in the sampled populations of birds, we partly control for these effects by including a random intercept of day within each year and species. This random factor explained most of the variance (see Table S1-S2) – something that could have been expected. In other words, we do control for what the third reviewer asked for. Similarly, we account for habitat features that may influence escape distance by including site in the models. Site usually refers to a specific park (we assume that within-park heterogeneity is lower than between park variation) and hence partly addresses the reviewer’s concern. Again, we highlight this within the Methods (L466-476).

      Reviewer #1 (Public Review):

      This paper uses a series of flight initiation "challenges" conducted both prior to and during COVID-19-related restrictions on human movement to estimate the degree to which avian escape responses to humans changed during the "anthropause". This technique is suitable for understanding avian behavioral responses with a high degree of repeatability. The study collects an impressive dataset over multiple years across five cities on two continents. Overall the study finds no effect of lockdown on avian escape distance (the distance at which the "target" individual flees the approaching observer). The study considers the variable of interest as both binary (during lockdown or prior to lockdown) and continuous, using the Oxford Stringency Index (with neither apparently affecting escape distance). Overall this paper presents interesting results which may suggest that behavioral responses to humans are rather inflexible over "short" (~2 year) timespans. The anthropause represents a unique opportunity to disentangle the mechanistic drivers of myriad hypothesized impacts humans have on the behavior, distribution, and abundance of animals. Indeed, this finding would provide important context to the larger body of literature aimed at these ends.

      Thank you very much for your positive feedback.

      However, the paper could do more to carefully fit this finding into the broader literature and, in so doing, be a bit more careful about the conclusions they are able to draw given the study design and the measures used. Taking some of these points (in no particular order):

      Thank you. We did our best in addressing your comments (see below and updated Methods, Results and Discussion sections).

      1) Oxford Stringency Index is a useful measure of governmental responses to the pandemic and it's true that in some scenarios (including the (Geng et al. 2021) study cited by this paper) it can correlate with human mobility. However, it is far from a direct measure of human mobility (even in the Geng study, to my reading, the index only explained a minority of the variation). Moreover, particular sub-components of the index are wholly unrelated to human mobility (e.g. would changes to a country's public information campaign lead to concomitant changes in urban human mobility?). Finally, compliance with government restrictions can vary geographically and over time (i.e. we might expect lower compliance in 2021 than in 2020) and the index is calculated at the scale of entire countries and may not be very reflective of local conditions. Overall this paper could do more to address the potential shortcomings of the Oxford Stringency Index as a measure of human mobility including attempting to validate the effect on human mobility using other datasets (e.g. the google dataset and/or those discussed in (Noi et al. 2022). This is of critical importance since the fundamental logic of the experimental design relies on the assumption that stringency ~ mobility.

      Thank you for this comment. First, Oxford Stringency Index seemed to us as the best available index for our purposes, i.e to estimate people's mobility during the shutdown because restrictions surely influenced the possibility that people would be outside, and because the index is a country-specific estimate. However, in addition, we now checked all indices mentioned in Noi et al. 2022 and found useful only the Google Mobility Reports, which we now use, because (a) it is publicly available, (b) it is available also for territories outside US, and (c) provides data for each city included in our dataset as well as for urban parks where most of our data were collected. Note that some platforms are no longer providing their mobility data (e.g. Apple).

      However, Google Mobility provides day-to-day variation in human mobility, whereas we are interested in overall increase/decrease in human mobility. Nevertheless, we correlated the Google mobility index with the Stringency index and found that human mobility generally decreases with the strength of the anti-pandemic measures adopted in sampled countries (albeit the effect for some countries, e.g. Poland, is small; Fig. 5).

      Moreover, we also added analysis using # of humans collected directly in the field during escape trials (e.g. Fig. 6 and S6) and found that the link between # of humans and Stringency index or Google Mobility was weak and noise, 95%CIs widely crossing zero (Fig. 6).

      Importantly, if we use Google Mobility and # of humans, respectively, as predictors of escape distance, the results are qualitatively very similar to results based on Oxford Stringency Index (Fig. S6), or Period, with tiny effect sizes for both (95%CIs for Google Mobility -0.3 – 0.06, Table S5, for # of humans -0.12 – 0.02, Table S6) supporting our previous conclusions.

      Note that Google Mobility and the number of humans have their limitations (see our comment to the editor and the Methods section in the main manuscript, e.g. L418-433). The lack of Google Mobility data for years before the COVID-19 pandemic does not allow us to fully explore whether overall human activity decreased during COVID-19 or not (our test for period prior and during COVID-19). If the year 2022 reflects a return to “normal” (which is to be disputed due to COVID-19-driven rise in home office use) the 2020 and 2021 had on average lower levels of human activity (Fig. 4). Whether such a difference is biologically meaningful to birds is unclear given the immense day-to-day change in human mobility and presence (Fig 4). Moreover, the number of humans capture within- and between-day variation rather than long-term changes in human presence.

      We added details on the new analysis into the method and results sections (e.g. Fig. 4-6; L142-165, 418-438, 495-535) and Supplementary Information (Figs. S5-S9 and associated Tables) and discuss the problematic accordingly. Moreover, to enhance clarity about country specific effect (or their lack), we also add country specific estimates to the Results (Fig. 1 and Fig. S6 and respective Tables). Finally, our statistical design and random structure of the model allowed us to control for spatial and temporal variation in compliance with government restrictions.

      2) The interpretation of the primary finding (that behavioral responses to humans are inflexible) could use a bit more contextualization within the literature. Specifically, the study offers three potential explanations for the observed invariance in escape response: 1) these behaviors are consistent within individuals and this study provides evidence that there was no population turnover as a result of lockdowns; 2) escape response is linked to other urban adaptations such that to be an urban-dwelling species dictates escape response; and/or 3) these populations already exhibit maximum habituation and the reduction in human mobility would only have increased that habituation but that trait is already at a boundary condition. Some comments on each of these respectively:

      Thank for these comments. We incorporated them in the main text (L293-329). Your point 1) corresponds to our point (i): “Most urban bird species in our sample may be relatively inflexible in their escape responses because the species may be already adapted to human presence” (L293-306); your point 2) to our point (ii): “Urban environment might filter for bold individuals (Carrete and Tella, 2013, 2010; Sprau and Dingemanse, 2017). Thus, the lack of consistent change in escape behaviour of urban birds during the COVID-19 shutdowns may indicate an absence (or low influx) of generally shy, less tolerant individuals and species from rural or less disturbed areas into the cities…” (L307-314); your point 3) to our point (iii): “Urban birds might have been already habituated to or tolerant of variation in human presence, irrespective of the potential changes in human activity patterns” (L315-329). To distinguish between (ii) and (iii) or the two from (i), individually-marked birds and comprehensive genetic analyses are needed, which we now note in the Discussion (L330-348). Importantly, we also discuss that the lack of response might be due to relatively small changes in human activity (L253-292), which we unfortunately could not fully quantify.

      a) Even had these populations turned over as a result of a massive rural-to-urban dispersal event, it's not clear that the escape distance in those individuals would be different because this paper does not establish that these hypothetical rural birds have a different behavioral response which would be constant following dispersal. Thus the evidence gathered here is insufficient to tell us about possible relocations of the focal species.

      Thank you for this point. We address this point in the Introduction and Discussion (L92-101, 307-314). Rural bird populations/individuals are on average less tolerant of humans than urban birds (e.g. Díaz et al. 2013, PloS One 8:e64634; Tryjanowski et al. 2020, J Tropic Ecol 36:1-5; Mikula et al. 2023, Nat Commun 14:2146) and at the same time, bird individuals seem consistent in their escape responses (Carrete & Tella 2010, Biol Lett 23:167–170; Carrete & Tella 2013, Sci Rep 3:1–7).

      Additionally, the paper cites several papers that found no changes in abundance or movements of animals in response to lockdowns but ignore others that do. For example: (Wilmers et al. 2021), (Warrington et al. 2022) (though this may have been published after this was submitted...), and (Schrimpf et al. 2021).

      We added the papers (L89-91). Thank you!

      There is a missed opportunity to consider the drivers of some of these results - the findings in this paper are interesting in light of studies that did observe changes in space use or abundance - i.e. changes in space use could arise precisely because responses to humans are non-plastic but the distribution and activities of humans changed.

      Thank you. Indeed, we now address this in the Discussion (L303-306): “However, some studies reported changes in the space use by wildlife (Schrimpf et al., 2021; Warrington et al., 2022; Wilmers et al., 2021). and these could arise, as our results indicate, from fixed and non-plastic animal responses to humans who changed their activities”.

      To wit, the primary finding here would imply that the reaction norm to human presence is apparently fixed over such timescales - however, and critically, the putative reduction in human activity/mobility combined with fixed responses at the individual level might then imply changes in avian abundance/movement/etc.

      Unfortunately, we have not measured changes in avian abundance or movements. But, please, note that the change in human mobility in sampled cities might be not as dramatic as initially thought and we consider this scenario to be most plausible in explaining no significant differences in avian escape responses before and during the COVID-19 shutdowns (see Fig. 4). Nevertheless, we add your point into the Discussion: If our findings imply that in birds the reaction norm to human presence is fixed over the studied temporal scale, the putative changes in human presence might then imply changes in avian abundance or movement (L293 and text below it).

      b) If this were the case, wouldn't this be then measurable as a function of some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here? Site accounted for ~15% of the total variation in escape distance but was treated as a random effect - perhaps controlling for the nature of the urban environment using some e.g. remotely sensed variable would provide additional context here.

      Urbanity mirrors the long-term level of human presence in cities whereas we were interested mainly in the rather short-term effects of potential changes of human presence on bird behaviour. Thus, we are not sure how adding such variable will help elucidating the current results. Please, also note that we added the country-specific analysis. Site indeed accounted for considerable amount the total variance in escape distance and that is why it was included as random intercept, which controls for non-independents of data points from each city. This could partly help us to control for difference in habitat type (e.g. urbanization level) within cities.

      c) Because it's not clear the extent to which the populations tested had turned over between years, the paper could do with a bit more caution in interpreting these results as behavioral. This study spans several years so any response (or non-response) is not necessarily a measure of behavioral change because the sample at each time point could (likely does) represent different individuals. In fact, there may be an opportunity here to leverage the one site where pre-pandemic measures were taken several years prior to the pandemic. How much variance in the change in escape distance is observed when the gap between time points far exceeds the lifetime of the focal taxa versus measures taken close in time?

      We believe the initial Fig S4, now Figure 2, addresses this point. The between years temporal variation in FIDs exceeds the variation due to lockdowns. This is true both for measures taken in consecutive years, as well as for measures taken far apart.

      d) Finally, I think there are a few other potential explanations not sufficiently accounted for here:

      i) These behaviors might indeed be plastic, but not over the timescales observed here.

      We agree and have added this point (L301-303). Thank you.

      ii) Time of year - this study took place during the breeding season. The focal behavior here varies with the time of year, for example, escape distance for many of these species could be tied up in nest defense behaviors, tradeoffs between self-preservation and e.g. nest provisioning, etc.

      Please, note that we controlled for the date in our analyses. Date was used as a proxy for the progress in the breeding season (L463-464 and Fig. 1 caption). Note that we collected data only from foraging or resting individuals, and data were neither collected near the nest sites nor from individuals showing warning behaviours, which we now note (L400-401).

      iii) Escape behaviors from humans are adaptively evolved, strongly heritable, and not context dependent - thus we would only expect these behaviors to change on evolutionary timescales.

      We discussed this at L307-308 and 381-383. Escape behaviors from humans are highly consistent for individuals, populations, and species (Carrete & Tella 2010, Biol Lett 23:167–170; Díaz et al. 2013, PloS One 8:e64634; Mikula et al. 2023, Nat Commun 14:2146). Whether such behavior is consistent across contexts is less clear (e.g. Diamant et al. 2023, Proc Royal Soc B, in press; but see, e.g. Radkovic et al. 2019, J Ecotourism 18:100-106; Gnanapragasam et al. 2021, Am Nat 198:653-659). Escape distance is often not measured simultaneously, for example, with human presence. In other words, whereas general level of human presence may have no effect on escape distance, the day-to-day or hour-to-hour variations might. We need studies on fine temporal scales (day-to-day or hour-to-hour) using marked individual to elucidate this phenomenon.

      iv) See point one above - it's possible that the lockdown didn't modify human activity sufficiently to trigger a behavioral response or that the reaction norm to human behavior is non-linear (e.g. a threshold effect).

      We agree, now use also Google Mobility Reports and # of humans data to elucidated this phenomenon and have added such interpretations to L253-292 and, e.g. Fig. 4.

      LITERATURE CITED Geng DC, Innes J, Wu W, Wang G. 2021. Impacts of COVID-19 pandemic on urban park visitation: a global analysis. J For Res 32:553-567. doi:10.1007/s11676-020-01249-w

      Noi E, Rudolph A, Dodge S. 2022. Assessing COVID-induced changes in spatiotemporal structure of mobility in the United States in 2020: a multi-source analytical framework. Int J Geogr Inf Sci.

      Schrimpf MB, Des Brisay PG, Johnston A, Smith AC, Sánchez-Jasso J, Robinson BG, Warrington MH, Mahony NA, Horn AG, Strimas-Mackey M, Fahrig L, Koper N. 2021. Reduced human activity during COVID-19 alters avian land use across North America. Sci Adv 7:eabf5073. doi:10.1126/sciadv.abf5073

      Warrington MH, Schrimpf MB, Des Brisay P, Taylor ME, Koper N. 2022. Avian behaviour changes in response to human activity during the COVID-19 lockdown in the United Kingdom. Proc Biol Sci 289:20212740. doi:10.1098/rspb.2021.2740

      Wilmers CC, Nisi AC, Ranc N. 2021. COVID-19 suppression of human mobility releases mountain lions from a landscape of fear. Curr Biol 31:3952-3955.e3. doi:10.1016/j.cub.2021.06.050

      Reviewer #2 (Public Review):

      Mikula et al. have a large experience studying the escape distances of birds as a proxy of behavioral adaptation to urban environments. They profited from the exceptional conditions of social distance and reduced mobility during the covid-19 pandemic to continue sampling urban populations of birds under exceptional circumstances of low human disturbance. Their aim was to compare these new data with data from previous "normal" years and check whether bird behavior shifted or not as a consequence of people's lockdown. Therefore, this study would add to the growing body of literature assessing the effect of the covid-19 shutdown on animals. In this sense, this is not a novel study. However, the authors provide an interesting conclusion: birds have not changed their behavior during the pandemic shutdown. This lack of effects disagrees with most of the previously published studies on the topic. I think that the authors cannot claim that urban birds were unaffected by the covid-19 shutdown. I think that the authors should claim that they did not find evidence of covid-19-shutdown effects. This point of view is based on some concerns about data collection and analyses, as well as on evolutionary and ecological rationale used by the authors both in their hypotheses and results interpretation. I will explain my criticisms point by point:

      We are grateful for your positive appraisal of our manuscript, as well as for your helpful critical comments. We toned down the discussion to claim, as suggested by you, that we did not find evidence for effects of covid-19-shutdowns on escape behaviour of birds in urban settings (see Results and Discussion sections). In general, we attempted to provide a more nuanced discussion and reporting of our findings. We also changed the manuscript title to “Urban birds' tolerance towards humans was largely unaffected by the COVID-19 shutdowns” and added validation using Google Mobility Reports (Fig. 5 & S6, Table S3a and S5) and the actual number of humans (Fig. 6 and S6; Table S3b-e and S6). Note however that there is only a single robust study on the topic of shutdown and animal escape distances (Diamant et al. 2023, Proc Royal Soc B, in press), i.e. the topic is largely unexplored (e.g. L99-101), whereas we discuss our finding in light of shutdown influences on other behaviours (L293-329).

      1) The authors used ambivalent, sometimes contradictory, reasoning in their predictions and results interpretation. Some examples:

      We tried to clarify our reasoning and increased consistency in our claims in the Introduction. Please, note that we simplified the Introduction and now provide one main expectation: FIDs of urban birds should increase with decreased human presence. This pattern is robustly empirically documented, regardless of the mechanism involved (e.g. Díaz et al. 2013, PloS One 8:e64634; Tryjanowski et al. 2020, J Tropic Ecol 36:1-5; Mikula et al. 2023, Nat Commun 14:2146). Please, see our revised Discussion for a more comprehensive discussion of mechanisms which could explain the patterns described in our study.

      1.1) The authors claimed that urban birds perceive humans as harmless (L224), but birds actually escape from us, when we approach them... Furthermore, they escape usually 5 to 20 m away. This is more distance that would be necessary just to be not trampled.

      We agree and have deleted mentions that humans are perceived as harmless.

      1.2) If we are harmless, why birds should spend time monitoring us as a potential threat (L102)? Indeed, I disagree with the second prediction of the authors. I could argue that reduced human activity should increase animal vigilance because real bird predators (e.g. raptors) may increase their occurrence or activity in empty cities. If birds should increase their vigilance because the invisible shield of human fear of their predators is no longer available, then I would expect longer escape distances.

      Thank you for this comment. We deleted this prediction and largely rewrote Introduction based on your comments and comments from the other reviewers.

      1.3) To justify the same escape behavior shown by birds in pre- and pandemic conditions from an adaptive point of view, the authors argued a lack of plasticity and a strong genetic determination of such behavior. This contravenes the plasticity proposed in the previous point or the expected effect of the stringency index (L112).

      We now attempted to write this more clearly while incorporating your suggestions. In the Discussion, we now propose various hypothesis that can, but need not be mutually exclusive. Please, note that we simplified the Introduction and now provide one main hypothesis: FIDs of urban birds should increase with decreased human presence.

      In my opinion, some degree of plasticity in the escape behavior would be really favorable for individuals from an adaptive perspective, as they may face quite different fear landscapes during their lives. Looking at the figures, one can see notable differences in the escape distance of the same species between sites in the same city. As I can hardly imagine great genetic differences between birds sampled in a park or a cemetery in Rovaniemi, for instance, I would expect a major role of plasticity to explain the observed variability. Furthermore, if escape behavior would not be plastic, I would not expect date or hour effects. By including them in their models, the authors are accepting implicitly some degree of plasticity.

      We regret being unclear. We do accept some degree of plasticity. Yet, our study design prohibits the assessment of the degree of individual plasticity because sampled birds were not individually marked and approached repeatedly. We tried to soften the statements in our Discussion to not fully dismiss a possibility that urban birds have some degree of plasticity in their antipredator behaviour (L293-329). Note however, that while our data collection was not designed to test how hour-to-hour changes in human numbers influence escape distance, the effect of the number of humans (i.e. hour-to-hour variation in human numbers) in our sample was tiny.

      The date and hour effect simply control for the particularities of the given day and hour (e.g. warm vs cold times or the time until sunset). In other words, the within species differences (even from the same park) may have little to do with individual plasticity, but instead may reflect between individual differences. We now add this issue to Methods (L471-476): “This approach enabled us to control for spatial and temporal heterogeneity and specificity in escape behaviour of birds (e.g. species-specific responses, changes in escape distances with the progress in the breeding season, spatial and temporal variation in compliance with government restrictions or particularities of the given day and hour)....”

      2) Looking at the figures I do not see the immense stochasticity (L156, Fig. S3, S5) claimed by the authors. Instead, I can see that some species showed an obvious behavioral change during the shutdown. For instance, Motacilla alba, Larus ridibundus, or Passer domesticus clearly reduced their escape distances, while others like the Dendrocopos major, Passer montanus, or Turdus merula tended to increase it.

      At L138-141 and 327-329 we discussed the within and between genera and cross-country variation and stochasticity in response to the shutdowns (Fig. 2). The reference to species-specific plots was perhaps a little bit misleading. We think that the essential figure, that we now reference at this point, is Figure 2 that shows the temporal trends and/or stochasticity that seem to have little in common with lockdowns. Please, also look at Figure 3 and S3-S4. These show that in all selected genera/species, the trends did not significantly deviate from central regression line which indicates no change in FID before and during the COVID-19 shutdowns.

      On the other hand, birds in Poland tended to have larger escape distances during the shutdown for most species, while in Rovaniemi there was an apparent reduction of escape distances in most cases. The multispecies and multisite approach is a strength of this study, but it is an Achilles' heel at the same time. The huge heterogeneity in bird responses among species and sites counterbalanced and as a result, there was an apparent lack of shutdown effects overall. Furthermore, as most data comes from a few (European) species (i.e. Columba, Passer, Parus, Pica, Turdus, Motacilla) I would say that the overall results are heavily influenced (or biased) by them. The authors realize that results are often area- or species-specific (L203), therefore, does a whole approach make sense?

      We are grateful for this valuable comment. We believe the general approach makes sense as there is a general expectation about how birds should respond to changes in human presence. That is why we control for non-independence of data points in our sample. Thus, although lots of data come from a few European species, this is corrected for by the model. Note that given the sheer number of sampled species, some site- or species-specific trends may have occurred by chance. Importantly, we believe that Figure 2, with species-site specific temporal trends, reveals that the between year stochasticity in escape distances seems greater that any effects of lockdowns. Nevertheless, we have further dealt with this issue in the revised manuscript by running country-specific models which again clearly showed no significant effect of Period on escape behaviour of birds (including, no effects in Poland and Finland).

      3) The previous point is worsened by the heterogeneity of cities and periods sampled. For instance:

      3.1) I can hardly imagine any common feature between a small city in northern Finland (Rovaniemi) and a megacity in Australia (Melbourne). Thus, I would not be surprised to find different results between them.

      3.2) Prague baseline data was for 2014 and 2018, while for the rest of the study sites were for 2018 and 2019. If study sites used a different starting point, you cannot compare differences at the final point.

      We are slightly confused by these comments.

      3.1) The cities are expected to be different but (i) the difference may be smaller than imagined (e.g. park structures, managed grass cover, few shrubs and deciduous-dominated tree species) and (ii) we expect the effects of lockdowns to be similar across cities. Whether we have no people in Rovaniemi parks (which despite Rovaniemi’s small size are usually extremely well-visited) or no people in Melbourne parks should not make a difference in principle. Note however, that to avoid overconfident conclusions, we allow for different reaction norms within cities. Please, also note that we are now providing country-specific results which should identify whether shutdowns lead to different reaction in sampled countries. We found no strong effect of shutdowns in any of sampled countries/cities.

      3.2) Because of the possible between site differences at the starting point, we use study site as random intercept and control for the between site reaction norms by including the random slope of the period. In other words, such possible differences do not influence outcomes of our models. Regardless, our a priori expectation is that the human activity levels in a given park was similar prior to covid and hence in 2014, 2018, and 2019. Again, we are now providing country-specific results which identify whether shutdowns led to different reactions in sampled countries, which they mostly did not

      3.3) Due to the obvious seasonal differences between the northern and southern hemispheres, data collection in Australia began five months later than in the rest of the sites (Aug vs Mar 2020). There, urban birds faced already too many months of reduced human disturbances, while European birds were sampled just at the beginning of the lockdown.

      We agree that each city or even park within the city has its specific environmental conditions (here including the time point of lockdown). That is why we control for city and park location in the random structure of the model (see Method section). We now add results per country that shows no clear differences (e.g. Fig. 1).

      However, the aim of our study was to test for general, global effects of lockdowns, which are minimal. Note that we now specifically test for country-specific effects in separate models on each country (e.g. Fig. 1, Fig S6) but all country-specific effects are small and still centre around zero.

      3.4) Some cities were sampled by a single observer, while others by many of them. Even if all of them are skilled birders, they represent different observers from a statistical point of view and consequently, observer identity was an extra source of noise in your data that you did not account for.

      We agree. In Finland and Hungary, data were collected by two closely cooperating observers. In Poland, all data were collected by a single observer. In the Czech Republic and Australia, a single observer (P.M. and M.W., respectively) sampled 46 sites out of 56 and 32 sites out of 37, respectively. Each site was sampled by the same observer both before and during the shutdowns. We now clearly state it in the Methods (L352-356). In other words, our models already largely control for the possible observer confound by having site as a random intercept. Moreover, previous study showed that FID estimates do not vary significantly between trained observers (Guay et al. 2013, Wildlife Research, 40, 289-293).

      4) Although I liked the stringency index as a variable, I am not sure if it captured effectively the actual human activity every day. Even if restrictive measures were similar between countries, their actual accomplishment greatly depended on people's commitment and authorities' control and sanctions. I would suggest using a more realistic measure of human activity, such as google mobility reports.

      Thank you for this comment. We now validate the use of the stringency index with the Google Mobility Reports, showing that human mobility generally (albeit in some countries relatively weakly) decreases with the strength of governmental antipandemic measures. Please, note that our main research question is related to the general change in human outdoor activity and not to week-to-week, day-to-day or hour-to-hour changes captured by stringency index, Google Mobility or the number of humans during an escape trial data. Nevertheless, using Google Mobility and the number of humans as predictors led to the similar results as for stringency index and Period (Fig. 1 and S6). Please, see extended discussion on this topic in our manuscript (L270-292).

      5) The authors used escape trials from birds on the ground and perched birds. I think that they are not comparable, as birds on the ground probably perceive a greater risk than those placed some meters above the ground, i.e. I would expect shorter escape distances for perched birds. As this can be strongly dependent on the species preferences or sampling site (i.e, more or less available perches), I wonder how this mixture of observations from birds on the ground and perched birds could be affecting the results.

      We now added information that most birds were sampled when on the ground (79%). Importantly, previous studies have found that perch height has a minimum effect on FIDs (e.g. Bjørvik et al. 2015. J Ornithol 156:239–246; Kalb et al. 2019, Ethology 125:430-438; Ncube & Tarakini 2022, Afr J Ecol 60:533– 543; Sreekar et al. 2015,. Tropic Conserv Sci 8:505-512). We added this information to the Method section (L394-395).

      6) The authors did not sample the same location in the same breeding season to avoid repeated sampling of the same individuals (L331). This precaution may help, but it does not guarantee a lack of pseudoreplication. Birds are highly mobile organisms and the same individuals may be found in different places in the same city. This pseudoreplication seems particularly plausible for Rovaniemi, where sampling points must be necessarily close due to the modest size of this city.

      We appreciate your concern. We cannot fully exclude the possibility of sampling some individuals twice. However, we sampled during the breeding season within which most birds are territorial, active in the areas around the nests and hence an individual switching parks is unlikely. Also, most sampled birds in our study are passerines which have small territories (typically few hundred square meters). Some larger birds may have larger territories and move larger distance to forage (e.g. kestrels which often forage outside cities) but these birds represent a minority of our records and we have not sampled outside the cities.

      7) An intriguing result was that the authors collected data for 135 species during the shutdown, while they collected data only for 68 species before the pandemic. Such a two-fold increase in bird richness would not be expected with a 36% increase in sampling effort during 2020-21. I wonder if this could be reflecting an actual increase in bird richness in urban areas as a positive result of the shutdown and reduced human presence.

      There were 141 unique day-years during before COVID and 161 during COVID. So, the sampling effort as calculated by days does not explain the difference in species numbers. Whether the actual effort, which was 381 vs 463 h of sampling, explains the difference is unclear, which we now note in the Methods (L476-483). If not, your proposition is possible, but we would like to avoid any speculations on this topic in the manuscript as it is difficult to infer species diversity from FID sampling.

      8) The authors dismissed the multicollinearity problem of explanatory variables unjustifiably (L383). However, looking at fig. S1, I can see strong correlations between some of them. For instance, period and stringency index were virtually identical (r=0.95), while temperature and date were also strongly correlated.

      We are confused by this comment and think this reflects a misunderstanding. Period and stringency index are explanatory variables of interest that were never included in the same model and hence their correlation does not contribute to the within a model multicollinearity. To avoid further confusion, we note this within (Fig. S2) legend. However, we must be cautious when interpreting the results from the models on period, Google Mobility, # of humans and stringency index, as the four measure are similar.

      We discuss multicollinearity of explanatory variables within the manuscript (L458-538, 548-550) and noted that, with the exception of temperature and day within the breeding season (r = 0.48), the correlations among explanatory variables were minimal. We thus used only temperature as an explanatory variable (i.e. fixed factor; also because temperature reflects both season and variation in temperature across a season) whereas the day was included as a random intercept to control for pseudoreplication within day. Collinearity between all other predictors was low (|r| <0.36).

      9) The random structure of the models is a key element of the statistical analyses but those random factors are poorly explained and justified. I needed to look up the supplementary tables to fully understand the complex architecture of the random part of the models. To the best of my knowledge, random variables aim to account for undesirable correlations in the covariance matrix, which is expected in hierarchical designs, such as the present one. However, the theoretical violation of data independence may happen or not. As the random structure is usually of little interest, you should keep it as simple as necessary, otherwise random factors may be catching part of data variability that you would like to explain by fixed variables. I think that this is what is happening (at least, in part) here, as the authors included a too-complex random structure. For instance, if you include the year as a random factor, I think that you are leaving little room for the period effect. The authors simplified the random structure of the models (L387), but they did not explain how. Nevertheless, this model selection was not important at all, as the authors showed the results for several models. I assume, consequently, that the authors are considering all these models equally valid. This approach seems quite contradictory.

      The random structure of the model controls for possible pseudoreplication in the data, that is for the cases where we have multiple data points that may not be independent and hence technically represent one. Apart from that, random structure tells us about where the variance in the data lies. This is often of interest and your previous questions about city, site or species specificities can be answered with the random part of the model. To follow up on your example, year is included in the model because data from a single year are not independent (for example because of delayed breeding season in one year vs. in another).

      We regret being unclear about the model specification and have attempted to clarify the methods (L466-476). We first specified a model with an ideal random structure that necessarily was complex (perhaps too complex). We then showed that using models with simpler random structures did not influence the outcomes. We now use a simpler model within the main text, but do keep the alternative models to show that the results are not dependent on the random structure of the model (Fig. S1 and Table S2).

      Reviewer #3 (Public Review):

      This study examined the changes in fear response, as measured by the flight initiation distances (FID), of birds living in urban areas. The authors examined the FIDs of birds during the pandemic (COVID-19 lockdown restrictions) compared to FIDs measured before the pandemic (mostly in 2018 & 2019). The main study justification was that human presence changed drastically during the pandemic lockdowns and the change in human presence might have influenced the fear response of birds as a result of changing the "landscape of fear". Human presence was quantified using a 'stringency' index (government-mandated restrictions). Urban areas were selected from within five different cities, which included four European cities (Czech Republic - Prague, Finland - Rovaniemi, Hungary - Budapest, Poland - Poznan), and one city in the global south (Australia - Melbourne). Using 6369 flight initiation distances across 147 different bird species, the authors found that FIDs were not significantly different before the pandemic versus during the pandemic, nor was the variation in FID explained by the level of 'stringency'.

      Major strengths: There are several strengths to this study that allows for understanding the variety of factors that influence a bird's response to fear (measured as flight initiation distances). This study also demonstrates that FIDs are highly variable between species and regions.

      Specifically,

      1) One of the major strengths of this paper is the focus on birds living in urban areas, a habitat type that is hypothesized to have changed drastically in the 'landscape of fear' experienced by animals during the pandemic lockdown restrictions (due to the presumed decrease in human presence and densities). Maintaining the focus on urban birds allowed for a deeper examination of the effect of human behaviour changes on bird behaviour in urban habitats, which are at the interface of human-wildlife interactions.

      2) This study accounted for several variables that are predicted to influence flight initiation distances in birds including species, genus, region (country), variability between years, pandemic year (pre- versus during), the strictness of government-mandated lockdown measures, and ecological factors such as the human observer starting distance, flock size, species-specific body size, ambient air temperature (also a proxy of the timing during the breeding season), time of day, date of data collection (timing within the regional [Europe or Australia] breeding season), and categorization of urban site type (e.g. park, cemetery, city centre).

      3) This study examined FIDs in two years previous to the pandemic (mostly 2018 and 2019, one site was 2014) which would account for some of the within- and between-year FID variation exhibited prior to the pandemic.

      4) This study uses strong statistical approaches (mixed effect models) which allows for repeat sampling, and a post hoc analysis testing for a phylogenetic signal.

      Thank you for your supportive and positive comments.

      Major weaknesses: The authors used government 'stringency' as a proxy for human presence and densities, however, this may not have been an accurate measure of actual human presence at the study sites and during measurements of FIDs. Furthermore, although the authors accounted for many factors that are predicted to influence fear response and FIDs in birds, there are several other factors that may have contributed to the high level of variation and patterns in FIDS observed during this study, thus resulting in the authors' conclusion that FIDs did not vary between pre- and during pandemic years.

      Thank you for your suggestions. We agree. To capture the general human presence in parks, we now incorporated an analysis using Google Mobility Reports (Fig S6b) that directly measures human mobility in each of sampled cities and specifically in urban parks where most our data were collected, and also address your further concerns that you detail below. Albeit not the main interest of our study, we now also incorporated an analysis using actual # of humans during an escape trial (Fig. S6c).

      Moreover, we think that including further possible confounds should not influence our conclusions. In other words, including further confounds will decrease the variance that can be explained by shutdowns and thus such shutdown effects (if any) would be tiny and hence likely not biologically meaningful.

      Specifically,

      1) The authors used "government stringency" as a measure of change in human activity, which makes the assumption that the higher the level of 'stringency', the fewer humans in urban areas where birds are living. However, the association between "stringency" and actual human presence at the study sites was not measured, nor was 'stringency' compared to other measures of human presence such as human mobility.

      Thank you for this essential comment. Initially, we viewed Oxford Stringency Index as the best available index for our purposes. However, we now further acknowledge its limitations (L) and validate the Oxford Stringency Index with the Google Mobility Reports data, showing that both indices are generally negatively (albeit sometimes weakly) correlated across sampled cities (i.e. human mobility decreases with the increasing stringency index). Although other human presence indices were used in the past, e.g. Cuebiq, Descartes Labs and Maryland Uni index, Apple (see Noi et al. 2022, Int J Geograph Info Sci, 36, 585-616), we used only the Google Mobility index because (a) it is publicly available, (b) is available also for territories outside US, and (c) provides data for urban parks within each city included in our dataset. Note however that Google Mobility data are inappropriate to answer our primary question, i.e. whether changes in human presence outdoors due to the COVID-19 shutdowns had any effect on avian tolerance towards humans. First, Google Mobility was available only for 2020-22, i.e. the baseline pre-COVID-19 data for 2018-2019 were unavailable. Thus, there was no way to check whether the human activity levels really changed during the COVID-19 years. Second, Google Mobility data are calculated as a change from 2020 January–February baseline for each day of the week for each city and its location (here we used parks). In other words, the data are not comparable between days and cities, albeit we attempted to correct for this within the random structure of the mixed model. Also, the data may be influenced by extreme events within the 2020 Jan–Feb baseline period (see here). Third, the Google Mobility varies greatly between days and across season (see Fig 4 & S5 or the first figure in these responses), likely more than the possible change due to shutdowns. Nevertheless, we found that results based on Google Mobility are qualitatively very similar to results based on stringency index. Moreover, we showed that the relationships between # of humans and both Google Mobility or Stringency index (Figure 6) are weak and noise with 95%CIs widely overlapping zero (Table S3b-e). Also, similarly to other predictors of human presence, # of humans only poorly predicted changes in avian escape distances. We added details on the new analysis into the Methods and Results and Supplement (L134-165 and associated figures and tables, L415-535).

      2) There was considerable variation in FID measurements, which can be seen in the figures, indicating that most of the variation in FID was not accounted for in the authors' models.

      We are confused by this statement. The fact that the FIDs varied does not translate directly to that our models did not account for the variation. Nevertheless, we do control for most of the discussed confounds (see further answers below). Importantly, it is unclear how including further possible confounds should influence our conclusions, unless the lockdowns effects are tiny, in which case those might not be biologically meaningful.

      Factors that may have contributed to variation in FIDs that were not accounted for in this study are as follows:

      a. The authors accounted for the date of data collection using the 'day' since the start of the general region's breeding season (Europe: Day 1 = 1 April; Australia: Day 1 = 15 August). Using 'day' since the breeding season started probably was an attempt to quantify the effect of the breeding stage (e.g. territory establishment, nest young, fledgling) on FIDs. However, breeding stages vary both within- and between species, as well as between sub-regions (e.g. Finland vs. Hungary). As different species respond to predation or human presence differently depending on the stage during their breeding cycle, more specificity in the breeding cycle stage may allow for explaining the observed variation and patterns in FID.

      We agree. Although we don’t have a precise city-specific information on the timing of breeding stages in sampled populations of birds, we partly control for these effects by including a random intercept of day within each year and species. This random factor explained relatively high portion of the variance in our data (see Table S1 and S2) - perhaps something you expected.

      b. Variation in species-specific FIDs may also vary with habitat features within urban sites, such as the proximity of trees and other protective structures (e.g. perches and cover), the openness of the area, and the level of stressors present (e.g. noise pollution, distance to roads). Perhaps accounting for this habitat heterogeneity would account for the FID variation measured in this study.

      We agree. We don’t have such fine-scale data, but we included site identity (typically within a particular park or cemetery) which should account for the habitat heterogeneity among localities. Depending on the model, site explained relatively little variance (1-6%), indicating low heterogeneity between localities in these undescribed characteristics. Also note that park structure may be quite similar both within and between cities, i.e. managed green grass areas, with only a few shrubs and deciduous trees. Therefore, the possible minor habitat heterogeneity should not have any great impacts on our results.

      c. The authors accounted for species and genus within their models, however, FIDs may vary with other species-specific (or even specific populations of a species) characteristics such as whether the species/population is neophobic versus neophilic, precocial versus altricial, and the level of behavioural plasticity exhibited. These variables were not accounted for in the analysis.

      We agree that FIDs can be correlated with many possible factors. Here, we were interested in general patterns, while controlling for FID differences between species, as well as for possible species-specific reaction norms to lockdowns. Whether neophobic vs neophilic population or precocial versus altricial species react differently to lockdowns might be of interest, but it is beyond the scope of this study. However, that population and population specific reaction norms explain little variation (Table S2a, 0-6% of variation) so such a confound should not substantially influence our conclusion much. We do not have fine-scale data on the level of neophobia, but the effects of lockdowns seem similar for precocial (see Anas, Larus, Cygnus) and altricial (the remaining, mostly passerine) species in our dataset (see Fig. 3 and S3-S4). Please, note that we sampled mainly adults (L386). Moreover, the effects for clades, which may differ in their cognitive skills, are also similar (e.g. Corvids vs. Anas or Cygnus; Fig. 3).

      d. Three different methods of measuring the distances between flight and the observer location were used, and FIDs were only measured once per bird, such that there were no measures of repeatability for a test subject. Thus, variation surrounding the measurement of FIDs would have contributed to the variation in FIDs seen during this study.

      While all observers were trained, the three methods may add some noise to the FID estimates. However, the FID estimates from a single method may still slightly differ between observers (so do well standardized morphology measurements; Wang, et al. 2019, PLoS Biology, 17, e3000156). Importantly, FID estimates are highly replicable among skilled observers (Guay et al. 2013, Wildlife Research 40:289-293), and we previously validated this approach and showed that distance measured by counting steps did not differ from distance measured by a rangefinder (Mikula 2014, Ardea 102:53-60), which we now explicitly state (L391-394). Importantly, we control for observer bias by specifying locality as a random intercept (see further details in our response to the Editor). Moreover, each site was sampled by the same observer both before and during the shutdowns.

      3) The sample design of this study may have influenced the FID variability associated with specific species, and specific populations of species. A different number of species were sampled across the time periods of interest; 68 species were sampled before the pandemic versus 135 species after the pandemic. However, the authors do not appear to have directly compared the FIDs for the same species before the pandemic compared to during the pandemic (e.g. the FIDs of Eurasian blackbirds before the pandemic versus during the pandemic). Furthermore, within the same country-city, it is unclear whether the species observed before the pandemic were observed at the same location (e.g. same habitat type such as the same park) during the pandemic. As a species' FID response may be influenced by population characteristics and features specific to each site (e.g. habitat openness), these factors may have influenced the variability in FID measurements in this study.

      We regret being unclear in our methods. Our full model uses all data, but alternative models (see e.g. Fig. S1) used data with ≥5 as well as ≥10 observations before and during lockdowns for a given species. Importantly, Figure 2 and 3 depict data for species sampled at specific sites. We now clarify this within the Methods (L460-483) and the Results (L125-133 and associated figures) and in the figure legends (Fig. S1).

      4) The models in this study accounted for many factors predicted to affect FIDs (see the section on major strengths), however, the number of fixed and random factors are large in number compared to the total sample size (N =6369), such that models may have been over-extended.

      The number of predictors and random effects is well within the limits for the given sample size (Korner-Nievergelt et al. 2015. Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan). Importantly, simpler models give similar results as the more complex ones (Fig. S1) and the visual (model free) representations of our raw and aggregated data confirm our model results. This, we suggest, makes our findings robust and convincing.

      Overarching main conclusion

      Overall, this study examines factors influencing FIDs in a variety of bird species and concludes that FIDs did not differ during the pandemic lockdowns compared to before the pandemic (2019 and earlier). Furthermore, FIDs were not influenced by the strictness of government-mandated restrictions. Although the authors accounted for many factors influencing the measurement of FIDs in birds, the authors did not achieve their aim of disentangling the effects of pandemic-specific ecological effects from ecological effects unrelated to the pandemic (such as habitat heterogeneity).

      We find this statement confusing. We accounted for most relevant confounding factors and found little evidence for the strong effect of pandemic. Moreover, we now added country-specific analyses that confirm the lack of evidence, highlight the Figure 3 that shows no clear shutdown effect and also explore how levels of human presence changed over and within the years. Adding more possible confounds (albeit note that not many are left to add) might only further reduce the variation that could be explained by pandemic and hence such hypothetical effects of pandemic will be if anything small and thus likely not biologically meaningful.

      Their findings indicate that FIDs are highly variable both within- and between- species, but do not strongly support the conclusion that FIDs did not change in urban species during the pandemic lockdown. Therefore, this study is of limited impact on our understanding of how a drastic change in human behaviour may impact bird behaviour in urban habitats.

      It is unclear why you think our study lacks support for the conclusion that FIDs changed little during pandemic, if all results show no such effects. However, we toned down our Discussion and highlighted also potential issues linked to our approach (e.g. that sampled individuals were not marked and hence we cannot distinguish between various mechanisms that might explain the described pattern (L293-329) or that human presence may not have changed (L253-269). For further details see our previous response.

      Overall, the study demonstrates the challenges in using FIDs as a general fear response in birds, even during a pandemic lockdown when fewer humans are presumably present, and this study illustrates the large degree of variation in FIDs in response to a human observer.

      We appreciate and agree that our study demonstrates the challenges in quantifying human activity to understand bird escape distance and we added a paragraph on this topic to the discussion (L270-292).

      Nevertheless, we hope that our above responses clarify and address most of the issues you had with our manuscript. We tried to show that (a) most of your proposed controls are indeed included in our study design, models, and visualisations, and that (b) multiple evidence (from models and visualisation of raw and aggregated data) support the no overall effect conclusion. We further emphasize the temporal and between- and within-species variability in FIDs in the Results and now specifically indicate that lockdowns did not influenced FIDs above such variability (Fig. 2-3, Fig. S3). In other words, the natural (e.g. temporal) variation in FIDs seems far greater that potential effects of lockdowns (Fig. 2). We believe that even if lockdowns would have tiny effects that could have been detected with more. stringent experimental design (e.g. individually tagged birds) or even more complex models, such effects would be far from being biologically meaningful.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors managed to show the broad botanical landscape and not only the main crops. This unique achievement is based on decades of establishing an excellent collection of a full comparative seed collection of the current flora. This allows the identification of species that usually are not identifiable. The authors were able to compare the crops that were grown there and identify the contribution of the Roman period with that of the Arab one. This excellent study is a landmark in how such studies should be done. The list of identified species will be used for many other studies on this subject.

      We are very grateful to Reviewer #1 for this generous assessment.

      Reviewer #2 (Public Review):

      Fuks et al. provide extensive paleobotanical data from several sites in the Negev desert to address hypotheses regarding the relative importance of the Roman Agricultural Diffusion (RAD) and the Islamic Green Revolution (IGR) in the dispersal of crops across Eurasia.

      While the overall claims from the authors are convincing, I found the presentation of the data somewhat difficult to follow.

      Graphical visualization of the data with respect to the proposed hypotheses would go a long way towards making the argument clearer for a non-specialist audience.

      The authors apply appropriate caveats in the discussion about their ability to assess IGR given their timeline only incorporates the first few hundred years and some IGR plants may not leave macrobotanical remains. Yet I think more could be done to explain how the data they do find provides positive evidence for RAD. Many of their findings are inferred to be RAD introductions not because of the timing in their sites, but because of previous evidence of introductions at other sites. It would thus be helpful to be more explicit about what additional evidence these findings provide beyond previously published data of introductions of many of these crops into the Levant.

      We thank Reviewer #2 for the positive assessment and helpful comments. We have moved several tables out of the main text to the supplementary tables. We also added a new schematic of the main findings regarding 1st millennium CE introductions to the southern Levant and their significance in the Negev Highlands crop assemblage (Figure 4). We have also added explanatory text to clarify the point about taphonomy vs. period of diffusion.

    1. Author Response

      eLife assessment

      This paper is of interest to researchers and policy makers involved in cervical cancer prevention. The paper provides insight into how the Covid19 pandemic accelerated changes in organized cervical cancer screening. The claim that self-sampling led to a major improvement of test coverage seems somewhat exaggerated and alternative hypotheses to those provided by the authors on the population who chose self-sampling are possible. Nonetheless, this is a valuable piece of work given the scope of the intervention(s) and the precedent it sets i.e. a crisis can in fact accelerate positive changes in screening that have been academic possibilities rather than practical realities.

      Thank you for this supportive summary. We have included exact data on exactly how much of the population test coverage that was attributable to self-samples. We have furthermore decided to focus on the population test coverage that is caused by organised testing (either taken by a clinician at a time and place that the woman was invited to by the organised program or taken by the woman herself using a sampling kit mailed to her by the organised program). These 2 improved analyses are intended to facilitate interpretation of how much of the improved test coverage that is attributable to the mailing of self-sampling kits.

      Reviewer #1 (Public Review):

      During the Covid19 pandemic, most cervical cancer screening programs were temporarily put on hold. The authors describe how Swedish health authorities dealt with this situation by implementing primary self-sampling and by launching a campaign with concomitant vaccination and screening. Besides, they show that the coverage of the screening program was one year after the start of the pandemic at pre-pandemic levels.

      Strengths of the paper are the clear presentation of the steps taken by the Swedish health authorities and the high quality of the presented screening coverage data which could be obtained directly from the screening registry. However, the paper would benefit from more in-depth analyses because the presented data raise questions. The number of invitations was >30 percent lower in the first year of the pandemic (Figure 1), but the screening coverage was only 4-5 percent lower. In the second year of the pandemic (year 2021), coverage was back at pre-pandemic levels, but the role of primary self-sampling in restoring screening coverage is a bit unclear. It is obvious that primary self-sampling made it possible to invite women again for screening during the pandemic, but there is no data on acceptance of primary self-sampling. Besides, the increase in coverage in year 2021 was only 4% and it is not clear whether such a modest increase could also have been achieved without primary self-sampling. In addition to self-sampling, the authors describe the launch of a concomitant vaccination and screening campaign. This is an interesting initiative but the authors do not show data on the coverage of this campaign in the target age range.

      We are now explaining that population test coverage is calculated over a whole screening interval. For example, if the screening interval is 3 years and improved attendance would only fully impact the population test coverage after 3 years. Furthermore, we are now presenting the exact data on how much of the test coverage is indeed attributable to the mailing of self-sampling kits.

      Reviewer #2 (Public Review):

      The manuscript by Elfstrom et al describes the impact of implementing self-sampling as the primary screening test in Sweden to address decreases in coverage following the COVID pandemic. The authors have a very rich dataset including all records of invitations to screen and screening results in the Stockholm area. A limitation is that there is no individual record linkage to allow investigation of the profile of the individuals who chose to screen using the self-sample.

      The conclusions are generally well supported by the authors with the following exceptions:

      1) There was not enough evidence presented in the manuscript to conclude that "The most likely explanation for the large increase in population coverage seen is that the sending of self-sampling kits resulted in improved attendance in particular among previously non-attending women."

      2) The authors state there is no evidence that delays in screening have impacted cervical cancer rates however they present no data to this effect in the manuscript.

      Although all screening and invitation data is indeed collected to the national screening registry, linking this data is not allowed without a permission from the Swedish National Ethical Review Board. We did apply for such a permission, which was granted on 2023-02-01, and a full set of registry linkage analyses to investigate the point raised by the reviewer is now included.

      The mention in discussion on stable cervical cancer rates was referring to public data from the national Cancer Registry. The source is now referenced.

      Reviewer #3 (Public Review):

      The authors report on the nature of interventions that were applied to aid and improve engagement in cervical screening, brought about by the SARS CoV Pandemic in Sweden.

      I appreciate that the impact of these interventions, given that they are recent, will take some time to quantify but the description (and reach) of the policy changes that occurred in a short amount of time is of significant interest to the screening community. The piece on HPV Even Faster is particularly novel; I am not aware of another example of where this has been enacted within a routine programme.

      Thank you for this supportive statement.

      The authors make reference to (15) where the reader can find greater details relating to the population who received the offer of self sampling (and the nature of the device). However I was a little confused (in this stand alone piece) as to who the self sampling group constituted exactly. Did this group not include pregnant women, women invited for first screen or women on non routine recall?

      This is correct, self-sampling kits were mailed to all women due for screening in the ages 26-70. Women due for screening aged 23-25 were invited for mid-wife-based sampling. Pregnant women were advised to come in for mid-wife-based screening, to save time. Women under follow-up from previous screens are not due for screening. This is now elaborated more clearly in the paper.

      The authors state that "the most likely explanation for the large increase in population coverage seen is that the sending of self-sampling kits resulted in improved attendance in particular among previously non-attending women" - why is this written as speculation at this stage (?) is it not possible to attribute directly the contribution made by self sampling, or is this in hand?

      See response to reviewer 2 above: Although all the data is indeed collected, we are not allowed to perform registry linkages without ethical permission. This has now been obtained and the requested analyses made.

      While self sampling is certainly an option that can support uptake and enfranchisement in cervical screening - its overall performance is fundamentally contingent on the number of women who then comply with follow up should the HPV test be positive; it is not simply about who returns the sample. It would have been of interest to see the proportion of women who did comply with follow up.

      The paper is not about follow-up strategies. Follow-up strategies are different in different settings and reporting is not standardized. They have also changed during the time of the study (e.g. cytology follow-up abandoned). A more detailed analysis of this would require a whole new paper.

    1. Author Response

      Reviewer #2 (Public Review):

      1) It could benefit from fleshing out concepts instead of using parentheses, particularly in the abstract.

      We agree and have amended the abstract and methods (please refer to responses provided to the editor’s comments 1a-1e)

      2) There is space to expand on the results presented in Table 1, including an explanation of Affected cohorts 2008 vs Affected cohorts 2008-2009. It may also be useful to explain this analysis in the methods section.

      Please refer to response provided to editor on the same question (comment 5).

      3) Given that Australia is a best-case scenario and other countries have not had the same success in HPV vaccination coverage, in the discussion would it be possible to give a comparison of how these three scenarios would look different in a population with school-based vaccination but lower coverage volume, such that readers could understand how much of the success / failures of each of the three catch-up scenarios? It would be particularly helpful for readers who are not familiar with the modelling tool used in this analysis.

      We have added some commentary in the discussion in response to the reviewer’s comment. In future, further similar work in countries with lower base coverage would be informative.

      “Australia is a relatively high HPV vaccination coverage setting. Outcomes may be less favourable in a lower coverage setting, as there would be less protection from herd effects; however, the impact of disruptions might also be smaller in a setting with lower coverage, since a lower coverage program would be less effective. Nevertheless, the finding that if catch-up is performed expeditiously then it mitigates much of the effect from vaccination delays, is likely to hold in other settings. In a previous study (Simms et al, Lancet Public Health. 2020 Apr;5(4):e223-e234) modelling the health impacts of HPV vaccination hesitancy in Japan from 2013 to 2019 and the potential effects of restoring coverage to 70% with catch-up vaccination in 2020 is informative as it demonstrates that multi-age HPV catch-up vaccination, after catastrophic falls in coverage in Japan, would be effective in mitigating the effects. “

    1. Author Response

      Reviewer #1 (Public Review):

      Ghosh and colleagues report on their multidisciplinary effort to improve cervical cancer screening attendance in the East Boston Neighborhood Health Center (March-August 2021). Specifically, the authors 1) identified using electronic medical records overdue follow-up visits, 2) scheduled screening appointments during regular clinic hours and weekends/evenings, and 3) surveyed patients on their experience. These objectives were clearly defined (although not consistently so throughout the manuscript) and data analyses/presentation were simple and straightforward, appropriate to the study design and methodology used.

      Thank you for this comment. We have clarified the objectives in the revised manuscript.

      Overall, it is unclear to what extent the overdue appointments were backlogs created by the COVID-19 pandemic or due to pre-pandemic factors that could have been exacerbated by the pandemic. In order to contextualize the current study and its findings, an elaboration is needed on whether the pandemic created the delays in cervical cancer screening or simply compounded the problem. For example, the authors report on page 8, lines 196-197 that in 30% of encounters (not clear how many of the 118 reviewed charts were overdue appointments) the healthcare provider did note the overdue appointments.

      We have Figure 2 (now Figure 4) and added Figures 2and 5 to address this comment. In 2019, prior to the COVID-19 pandemic, approximately 70% of patients were up-to-date with cervical cancer screening, corresponding to 8467 patients overdue for screening. In 2020, the up-to-date percentage dropped to 63.5% and the overdue number increased to 8812. Figure 2 is a flowchart of the project which clarifies the “30%” mentioned in the reviewer comment

      In addition, a brief description of the cervical cancer screening program in place would be informative.

      We have added this in the “setting” section of the methods on page 4-5, lines 107-128)

      Table 1 provides an effort versus value summary; however, these constructs are ill-defined, with few inconsistencies with what is reported in the text.

      This table is intended to help inform clinics that are considering implementing quality improvement programs about the effort required and value obtained for different aspects of our program. These are based in part on proprietary cost analyses so certain details are not able to be included. We have amended the text/table to eliminate inconsistencies.

      Comments specific to Aim 1:

      The methodology is missing information on key elements, mainly relating to the decision-making process of establishing and defining the "validated" patient chart list (1375 overdue patients out of 6126 reviewed charts). A chart of the 1375 approached study population is also warranted (459 patients were screened, 622 could not be reached, and 203 cancelled/missed their appointments, what about the remaining 91 patients). A description of the characteristics of the study population and a comparison of the different groups (screened, not reached, cancelled/missed appointment) along these characteristics are missing.

      We have added a flowchart with this information to the results section. See Figure 2.

      Comments specific to Aim 2:

      About 63% of the 459 scheduled screenings were done during the evening/weekend clinics, which represents a substantial gain and clearly indicates a window of opportunity to increase screening rates by pinpointing the importance of offering a convenient time to women attend screening visits. In general, and as expected, offering additional screening clinics was effective in addressing the backlog of patients, although with significant investment and resources as mentioned by the authors. How significant is significant?

      We are not able to share these data publicly. We have added the following sentence: “The cost data is proprietary/not shareable but analysis by clinical leadership indicated the program was not cost-effective/sustainable.” Page 22, lines 678-80

      Comments specific to Aim 3:

      A more structured and detailed presentation/description of the survey instrument, its administration, response rate, and significance of results are warranted in the manuscript, albeit the joint reporting of this in the appended material.

      We have added additional detail about the survey method (page 9, lines 225-6, 228-31) and results ( Page 14-5, lines 518-22, 530-3) . We also inserted the survey used in the clinics. (Figure 1).

      Reviewer #2 (Public Review):

      The purpose of this study is unclear from the introduction. Additionally, the methods are incomplete and did not describe how data was collected and analyzed. The results do not describe the sample. Once these are described more clearly, further comments can be made about what the authors were trying to achieve and the impact of the work on the field.

      We have clarified the study purpose in the introduction: “The purpose of the project was to examine the impact of a Quality Improvement intervention on improving cervical cancer screening, as well as to evaluate the effectiveness and sustainability of different methods for addressing overdue screening.” (page 3, lines 87-90)We have also clarified the methods and results to describe data extraction more completely from electronic medical records and statistical analysis using descriptive statistics.

    1. Author Response

      Reviewer #2 (Public Review):

      This work attempts to connect the diet of a mother to the physiology and feeding behaviors of multiple generations of her offspring. Using genetic and molecular biology approaches in the fruit fly model, the authors argue that this Lamarckian inheritance is mediated by germline-inherited chromatin and is regulated by the general activity of a histone methylase. However, many of the measured effects are small and variable, the statistical tests to prove their significance are missing or poorly described, and some experiments are inadequately described and lack important controls.

      1) The authors claim that the diet of a mother can influence the physiology of her progeny for several generations. However, the observed effects of maternal diet on later generations were small and variable for most assays (see Fig1C, S1.1A, B, D). Additionally, the effect size between F0 HSD to ND was often larger than the effect size between the progeny of F0 parents and ND. To put it another way, if the authors were to compare the F1, F2, etc. to the F0 HSD flies, they would conclude that the majority of the response to diet is not maternally transmitted, and is directly controlled by the diet of the individual being measured.

      We agree with the reviewer that the effect size of acute HSD exposure (in HSD-F0 flies) was stronger than that of transgenerational inheritance (in HSD-F1/2/3/4 flies). Similar observations were also made in other studies, see Klosin et al., Science, 2017, Bozler et al., eLife, 2019. We would argue this difference in effect size was as expected and with clear biological relevance.

      For all living organisms, acute environmental changes (diet change included) have direct and profound influences on their survival and reproduction, and therefore need robust and immediate responses. In comparison, ancestral environmental changes may only provide some vague and indirect indications of the current living environment of the offspring. Such information may be beneficial for the survival and reproduction of the offspring, but the effect size is expected to be much smaller, or at least smaller than that of acute environmental changes.

      Studies on Dutch Famine offers a good example. Human individuals who were prenatally exposed to famine were found to be associated with greater risk in metabolic diseases (Ravelli et al., NEJM, 1976). But nevertheless, direct high-fat diet exposure was still the much stronger risk factor for obesity and metabolic disorders (Bray et al., Am J Clin Nutr, 1998, Jéquier et al., Int J Obes Relat Metab Disord, 2002).

      We have added additional discussions in the manuscript for clarification.

      Furthermore, since our current study aimed to investigate the mechanism of behavioral transgenerational inheritance, we focused on the comparison between HSD-F1 flies (and their progeny) vs. ND-fed flies. As the ancestors of HSD-F1/2/3/4 flies were exposed to HSD, whereas HSD-F1/2/3/4 flies themselves were never exposed to HSD, any difference we observed between the two groups could be solely attributed to transgenerational inheritance of ancestral HSD exposure. With that saying, to better distinguish the effects of acute HSD exposure vs. transgenerational inheritance upon ancestral HSD exposure, we re-analysed and presented the comparisons among ND, HSD-F0, and HSD-F1 data in the manuscript (Figure 1. B-E, Figure 1-figure supplement 1. A-E, Figure 1-figure supplement 2. A-D, Figure 3. D-E, Figure 3-figure supplement 1. B-D, Figure 3-figure supplement 2 and 3. A-B).

      2) The authors chose to study PER, which had the largest average effect sizes between conditions. However, PER was highly variable in the averaged data, with some individuals showing large effects and others having no effects. A better characterization of transgenerational PER may increase the robustness of this assay and confidence in its results. For example, the authors could measure PER in lineages derived from individual flies to determine when transgenerational effects on PER decline or disappear. This form of data collection could help to explain the high variation in the averaged data presented in the paper.

      We acknowledged that PER in general was quite a variable behavioural trait (probably as to most if not all behavioural measures). It was not surprising since animal behaviours, as complex traits, could be influenced by numerous intrinsic and extrinsic factors, such as genetic background, developmental environment, diet, population density, environmental conditions, etc. Numerous PER studies have exhibited similar variability (Masek et al., PNAS, 2010, Marella et al., Neuron, 2012, Charlu et al., Nature Communication, 2013, Wang et al., Cell Metabolism, 2016, Wang et al., Cell Reports, 2020).

      Nevertheless, in our current study we were able to identify statistically significant behavioural difference between ND-fed flies and HSD-F1/2/3 flies, demonstrating that ancestral HSD exposure imposed transgenerational inheritance on sweet sensitivity. To further increase the robustness of the study as suggested by the reviewer, we have conducted additional repetitions of many PER experiments and further confirmed the phenotype with less variability and more statistical power (Figure 1. G-I, Figure 3. D-E, Figure 3-figure supplement 1. B-D, Figure 3-figure supplement 2 and 3. A-B). The reviewer also suggested the use of isogenic flies, which might help to minimize the variations of genetic background. However, we think that demonstrating the behavioural difference in genetically diverse fly populations is a more credible way to show that such transgenerational inheritance is a reliable and generalizable phenomenon.

      3) What do the error bars represent on any figure? There are many examples where the data is highly variable and lies completely outside of the error bars. What is the statistical test for significance that is carried out in each figure? The brief comment about statistics in the methods section is inadequate. The authors should also supply the raw data used to generate the figures so that readers can perform their own statistical tests.

      Data in the manuscript were represented as means ± SEM (standard error of the mean) in all of our figures, which is a standard practice in the field (Masek et al., PNAS, 2010, Charlu et al., Nature Comm, 2013, Wang et al., Cell Metabolism, 2016). We have provided detailed explanations of the statistical tests in the manuscript. We have also prepared raw data files as suggested by the reviewer.

      The model that global H3K27me3 is regulated by ancestral diet is unconvincing without further experimental validation and explanation. Points 4-10 address specific issues.

      4) The authors performed ChIP on cycle 11 embryos. This stage is extremely short (11 min) and contains roughly 10 times less chromatin than embryos only 30 minutes older. These features make it very difficult to collect large numbers of precisely staged embryos without significant contamination. It is also debatable whether early cell cycles (including and preceding cycle 11) are slow enough to deposit and propagate histone marks in the presence of new histone incorporation. See the opposing arguments in Zenk et al 2017 and Li et al 2014. The authors could perform ChIP on older embryos to avoid this controversy.

      We thank the reviewer for the clarification. Our embryo collection protocol involved allowing flies to lay eggs freely in a cage for 30 minutes followed by 50 minutes of incubation on a juice plate, and then completing the embryo sorting within 30 minutes. Therefore, to describe it in a more stringent way, our embryos should be in the stage between cycle 10-12. We have corrected this information in the manuscript (Figure 2. A).

      Since all the embryos were sorted using the same morphological criteria within the same time frame, their developmental stages should be comparable (i.e. all from cycle 10-12). In several references we consulted, a broader range (cycle 9-13) was used for ChIP-seq sequencing analysis (for example, see Zenk et al., Science, 2017).

      Surely any maternally inherited information will also be present in cycle 14 or 15 embryos if it is to influence the development or physiology of the brain. The observed differences in global H3K27me3 levels in F1 vs ND flies could be explained by slightly different aged embryo collections or technical variations in the ChIP protocol. The authors could strengthen their conclusion by performing more ChIP replicates. Alternatively, the authors could use orthogonal approaches like antibody staining or western blots to measure global H3K27me3 levels in precisely staged embryos.

      We chose to use cycle 10-12 embryos because we aimed to identify epigenetic modulations directly transmitted through the maternal germline. Embryos in cycle 14-15 might reveal more profound changes, but since embryos in that stage had entered the zygotic phase and started the remodeling of histone modifications, we think it might mask the maternally transmitted changes we sought to identify.

      In addition, we conducted two biological replicates for each group for the ChIP-seq analysis, which was a standard in the field (Zenk et al., Nature, 2021, Ing-Simmon et al., Nature Genetics, 2021). In the current study we further verified the genes identified in the ChIP-seq analysis in RNA-seq and qPCR analysis.

      We further verified the ChIP-seq results by using western blot, which showed a ~2 folds increase in H3K27me3 modification in HSD-F1 early embryos vs. ND-fed embryos, in line with the ChIP-seq data (Figure 2-figure supplement 1. B). We have also provided immunofluorescence results for embryos at cycle 13 and cycle 14, which clearly showed a significant increase in H3K27me3 modifications in HSD-F1 embryos (Figure 2-figure supplement 1. C).

      5) The authors measure PRC2 subunit mRNA levels in adult fly heads to attempt to explain the observed differences in inherited H3K27me3 levels in fly embryos. The authors should examine PRC2 components in germ cells and early embryos to understand how germ cells and early embryos generate H3K27me3 patterns.

      We have now shown that Pcl and E(z) mRNA expression in HSD-F0 flies were not significantly changed vs. ND-fed flies (Figure 2-figure supplement 2. D-G). Meanwhile, H3K27me3 demethylase UTX and H3K27ac acetyltransferase Cbp showed significant decrease (Figure 2-figure supplement 2. H). Therefore, HSD exposure imposed complex epigenetic modifications in HSD-F0 flies, which then led to transmission of epigenetic marks to their progeny. Given the main scope of this study was to understand which epigenetic program mediated the behavioral transgenerational inheritance upon ancestral HSD exposure (but not that mediated acute HSD exposure), we focused our effect on H3K27me3 which was significantly changed between HSD-F1 flies vs. ND-fed flies.

      6) The RNAi experiment targeting PRC2 components in embryos is uninterpretable without appropriate controls and an explanation of the genotypes used in the experimental paradigm. Are the authors crossing nosNGT mothers to UAS-RNAi fathers and assaying the progeny? What is the genotype of the F1 flies and how does it compare to the genotype of the ND flies? The authors should also note that the Gal4 drivers they use are not necessarily restricted to the ovary, and could directly affect other tissues controlling PER like neurons and muscle. Additionally, the authors should supply the appropriate controls to verify that their experimental paradigm has the intended effect. PRC2 proteins are presumably loaded into embryos and would be immune to zygotic-expressed RNAi. The authors could validate when PRC2 RNAi is effective by staining embryos for H3K27me3.

      We have now added schematic diagrams and detailed explanations in our revised manuscript to better explain the RNAi experiments (Figure 3-figure supplement 1. A). As shown in the diagram, we compared each RNAi treatment group to appropriate genetic controls. We have also noted in the manuscript that the GAL4 drivers we used were not restricted to the ovary.

      We have now verified the effect of PRC2 knockdown to reduce H3K27me3 in female germline by both western blot and immunofluorescence staining (Figure 3. B-C).

      7) Although the authors do not note this, nosNGT>RNAi affects the PER of ND flies (compare Gal4>RNAi to just RNAi or just Gal4 in ND columns in Fig3A-D). This could be due to RNAi expression in neurons or muscles or some other indirect effect. Regardless of the mechanism, this result makes it difficult to interpret how RNAi treatments affect the transgenerational inheritance of PER if there is an equivalently strong nontransgenerational effect.

      Although nosNGT>RNAi appeared to slightly affect PER response of ND-fed flies, there was no statistically significant difference (Figure 3-figure supplement 1. B and D, Figure 3-figure supplement 2. A-B). Rather, the effect of E(z) knockdown was evident in HSD-F1 flies (Figure 3-figure supplement 1. B), further confirming the involvement of H3K27me3 in transgenerational inheritance of PER reduction.

      8) The matalpha gal4 experiment is inadequately explained in the text or methods. Are the authors expressing RNAi in the ovaries of the F0 flies that are fed an HSD? Does the ovary influence their PER somehow? Similar to point 8, there appears to be a nontransgenerational component to the RNAi phenotype that clouds the interpretation of the transgenerational effect (compare F0 in S3.1A-C).

      We have now added a schematic diagram and detailed explanations in our revised manuscript to better explain the RNAi experiments (Figure 3. A). As shown in the diagram, we compared each RNAi treatment group to appropriate genetic controls.

      Similar to point 7, although Mat-tub-GAL4>RNAi might seem to affect PER responses of ND-fed flies, there was no statistically significant difference (Figure 3. D-E). Rather, the effect of E(z) knockdown was evident in HSD-F1 flies (Figure 3. D), further confirming the involvement of H3K27me3 in transgenerational inheritance of PER reduction.

      9) For the EED inhibitor experiments (both PER and calcium imaging), it is unclear whether the authors fed the mothers or their adult progeny the EED inhibitor. If adult progeny were fed, what tissues were affected? The authors should stain various tissues with an H3K27me3 antibody to verify the effectiveness of their inhibitor. Finally, the effect of the EED inhibitor on calcium imaging was not convincing because the variation was so large.

      We have added a new schematic diagram and provided more detailed explanations in the manuscript for pharmacological interventions (Figure 4. A-D). To verify the effect of the drug treatment, we showed that compared to the control group fed with DMSO, flies fed with the inhibitor showed a significant decrease in H3K27me3 levels, demonstrating the effectiveness of the inhibitor (Figure 4-figure supplement 1. A).

      We acknowledged the unsatisfactory quality of our calcium imaging experiments in our initial submission. We have now improved our experimental procedures to reach better data quality, while the conclusions remained consistent (Figure 4. E).

      10) In all of the PRC2 RNAi and inhibitor experiments, are there any other phenotypes that would suggest that the treatments are working? There are many published PRC2 loss-offunction phenotypes (molecular and developmental) in different tissues. The authors could assure the reader that their treatments are working as expected by doing these controls.

      As discussed above, we have now used western blot and immunofluorescence staining to validate the efficiency of PRC2 RNAi in female germline (Figure 3. B-C).

      11) The authors propose that a transgenerationally inherited state of the caudal gene is responsible for the transgenerationally inherited PER. However, the experiments investigating the methylation state and expression level of caudal are unconvincing. Cad mRNA abundance varied immensely in the ND RNAseq samples. When the authors compared cad levels across generations, the effect size was small. A single outlier in the ND sample in both the RNAseq and the RTPCR experiments appears to drive up its mean and effect size. The H3K27me3 ChIP on cad is very similar in the F1 and ND samples and the acetylation peak on its promoter appears unchanged. The authors could vastly improve the caudal experiments in this paper by simply using cad antibodies to stain the relevant tissues that contribute to PER. For example, the authors could stain GR5a neurons for cad expression in different generations that inherit (or don't inherit) maternal PER to more accurately determine if cad levels are indeed transgenerationally regulated. The authors could also perform more ChIP experiments at a less variable stage to convincingly correlate epigenetic marks on cad with its expression level.

      As discussed above, we conducted two biological replicates for each condition of the ChIP-seq analysis, which was a standard in the field (Zenk et al., Nature, 2021, IngSimmon et al., Nature Genetics, 2021). We have also performed western blot and immunofluorescence for H3K27me3 in ND vs. HSD-F1 embryos to further validate our ChIP-seq data (Figure 2-figure supplement 1. B-C).

      As for Cad gene, H3K27m3 signals showed a statistically significant difference between ND-fed and HSD-F1 flies (Figure 5. D). We have also conducted additional qPCR experiments to verify the gene expression changes of the Cad gene (Figure 5. F, right), which was in line with the ChIP-seq data and further supported its validity.

      It was worth noting that during the developmental time window of our ChIP-seq analysis, the acetylation signals in the promoter region of cad were very low (Figure 5. D), making it impossible to make a comparison.

      Reviewer #3 (Public Review):

      Jie Yang et al. investigated the transgenerational behavioral modification of a high-sugar diet (HSD) in Drosophila and revealed the underlying molecular and neural mechanisms. It has been reported that HSD exposure decreases sweet sensitivity in gustatory sensory neurons, resulting in reduced sugar response (Proboscis extension reflex, PER) in flies. The current study reports that this effect can be transmitted across generations through the maternal germline. Furthermore, the authors show that H3K27me3 modification is enhanced in the first-generation progenies of HSD-treated flies (F1), and genetical or pharmacological disruption of PCL-PRC2 complex blocks the behavioral change and restores the sweet sensitivity in the Gr5a+ sweet sensory neurons. The authors further analyze the differentially expressed genes in the F1 flies. Among H3K27me3 hypermethylated regions, they focus on homeobox genes and find a transcription factor Caudal (Cad), which shows decreased expression in the F1 flies. Knocking down Cad in Gr5a+ neurons results in decreased PER response to sucrose.

      Transgenerational changes in physiology and metabolism have been broadly studied, while inherited changes at the behavioral level are much less investigated. This work provides convincing evidence for transgenerational modification of feeding behavior and digs out the underlying molecular and neural mechanisms. However, there still are several concerns that need to be clarified.

      1) The epigenetic regulator PCR2 has been found to play an essential role in the 7d-HSDinduced modification of the PER response. In this study, it's important to clarify for the transgenerational change, whether epigenetic modification is required in the flies exposed to HSD (F0), the progenies (F1), or both. It would be very helpful for better interpretation if the procedures of HSD treatment in RNAi experiments and the drug treatments were stated in more detail. In addition, the F0 flies should be examined as the control.

      In this current study our main scope was to understand the transgenerational influence of HSD exposure on the progeny. To this aim, we chose to study the physiological and behavioral differences between ND-fed flies vs. HSD-F1 flies (and their progeny on ND). HSD-F1 flies (and their progeny) were not exposed to HSD in their whole life cycle and therefore the physiological and behavioral changes we observed vs. ND-fed flies could be solely attributed to epigenetic modifications transmitted via germline cells from HSD-F0 flies. Therefore ND-fed flies were used as the main control.

      As for HSD-F0 flies, the acute effects of HSD exposure could be more complex. Epigenetic factor was likely involved, as evident in Figure 3-figure supplement 1. C, Figure 3-figure supplement 3. A-B and Figure 4. C. In addition, HSD exposure might also directly affect gene expression and multiple signaling pathways in HSD-F0 flies (see Chen et al., Science China Life Sciences, 2020). Therefore, we did not aim to investigate how HSD exposure affected HSD-F0 flies in this current study. We have added additional discussions in the manuscript for clarification.

      With that saying, we still added more HSD-F0 flies as controls when needed (Figure 2-figure supplement 2. D-G, Figure 3-figure supplement 1. C, Figure 4. C, Figure 5. F, left).

      We have also modified the schematic diagrams and added more detailed explanations in the manuscript, in order to provide a clearer illustration of the experimental procedures (Figure 3. A, Figure 3-figure supplement 1. A, Figure 4. A, B and D). Specifically, we employed two different RNAi approaches. Firstly, we used genetic methods to obtain homozygous Mat-tub-gal4>UAS-gene X RNAi fly lines on chromosomes Ⅱ and Ⅲ for germline-specific knockdown (Figure 3, Figure 3-figure supplement 3). Secondly, we used heterozygous nosNGT-gal4>UAS-gene X RNAi flies for embryo-specific knockdown (Figure 3-figure supplement 1 and 2). Our drug experiments involved both treating the flies and measuring their PER (Figure 4. A-C) and treating the parental flies and measuring the PER of their progeny (Figure 4. D).

      2) The information on the drug treatment period is also missing for imaging experiments (Fig.4C). Moreover, the response curve is very different from those recorded in the same neurons in previous studies. What’s the reason? Please also provide a representative image showing which part of the Gr5a neurons is recorded.

      The experimental procedures of drug treatments were shown in Figure 4. A now. We fed adult flies with specific compounds for five days after eclosion, then measuring the calcium signals of Gr5a+ neurons when flies were fed with sucrose.

      As suggested by the reviewer, we have now conducted calcium imaging experiments more carefully and thoroughly. We have now added the new data into the revised manuscript and the conclusions remained consistent (Figure 4. E). We recorded the calcium signal in the axons of Gr5a+ neurons in the SEZ.

      3) It's unclear whether the decreased Cad expression upon HSD treatment specifically occurred in Gr5a+ neurons or a lot of cells. If the change in gene expression is significant in the qPCR test, it should occur in a large number of cells, most likely including different types of gustatory sensory neurons. If lower cad expression led to lower neural response and thereby lower behavioral response, how to specifically decrease the PER response to sucrose but not to other tastes? -whether HSD-induced desensitization is specific to sucrose in the offspring?

      We agree that Cad expression might decrease in a lot of cells including Gr5a+ neurons in the proboscis. In order to investigate whether taste perception other than sweet sensing was also affected, we conducted PER experiments with fatty acids, which was another type of appetitive taste cues like sugars. Perception of fatty acids is mediated by ionotropic receptors such as ir25a, ir76b, and ir56b (Ahn, et al., eLife, 2017, Brown., et al, eLife, 2021).

      Our results indicate that PER of fatty acid in HSD-F0 and HSD-F1 was not significantly reduced compared to the ND-fed controls (Figure 1-figure supplement 2. E-F). This suggests that the impact of Cad on gustatory sensory neurons might be specific to sweet sensitivity of Gr5a+ neurons.

      4) In Fig.2D, data are sorted for genomic regions showing an up-regulated modification of H3K27me. It's unclear whether similar sorting was performed in panel C. This needs to be clarified.

      The analysis shown in Figure 2C and 2D were linked. As for 2C, we identified genomic loci with enriched H3K27me3, H3K9me3, and H3K27ac peaks, and found that H3K27me3 peaks showed the most robust changes between ND-fed and HSD-F1 flies. Therefore we concentrated on these loci where H3K27me3 modifications were significantly changed between the two groups, and further analyzed their difference. As shown in Figure 2D, within these loci, H3K27ac modifications, which was functionally antagonizing to H3K27me3, were significantly reduced; whereas H3K9me3 signals within these loci remained unchanged. Such results confirmed that ancestral HSD exposure induced robust H3K27me3 modifications in certain genomic loci.

    1. Author Response

      Reviewer #1 (Public Review):

      The paper proposes a novel approach, named ModCRE, which utilizes structure-based learning to predict the DNA binding preferences of transcription factors (TFs). The authors integrate both experimental knowledge of the structures of TF-DNA complexes and large amounts of high-throughput TF-DNA interaction data. Additionally, the authors have developed a server that automatically produces these characteristics for other TFs and their complexes with co-factors.

      Strengths: The paper's integration of experimental knowledge and highthroughput data to develop statistical knowledge-based potentials to score the binding capability of TFs in cis-regulatory elements is a powerful strategy. The proposed approach can be applied to more than 80% of TF sequences, making it a general method for characterizing binding preferences.

      Weaknesses: The paper is difficult to follow, as it contains many technical details and implementation details. The method applied is not always clear, and the paper focuses on implementation rather than the message. The results indicate that the nearest neighbors approach in Figure 4 outperforms the proposed method in many cases, and the proposed method seems to perform better only when similarity with the target is low. The same applies in Fig. 5 when using normalized ranked scores.

      It appears that the authors have successfully developed a structure-based learning approach for predicting DNA binding preferences of transcription factors. However, the paper's technical language and implementation focus make it challenging to follow at times.

      It seems the authors have successfully achieved most of their aims in improving predictions for TF-DNA interaction, and the results support their conclusions.

      This work has the potential to significantly impact the field of TF-DNA binding and gene regulation, particularly for those interested in predicting PWMs for TFs with limited or unreliable experimental data.

      General comment: We wish to thank the reviewer for his/her comments helping us to facilitate the reading, clarify the ideas and certainly improve the manuscript. We also thank his/her comments on the strengths. In the current revision we have tried to solve the faults and improve the weaknesses. Certainly, the results section contained many explanations of the method and its implementation rather than its use and application. Referred to figures 4 and 5, the reviewer is right too: Our approach can help to predict the binding motif of a transcription factor on difficult cases, when the PWMs of closest homologs are unknown, but the structure of its complex with DNA can be provided. Otherwise, when information of binding is available for close homologs, traditional state-of-the-art approaches are better than our approach and we recommend them.

      Reviewer #2 (Public Review):

      This work describes the development of a new structure-based learning approach to predict transcription binding specificity and its application in the modeling of regulatory complexes in cis-regulatory modules. The development of accurate computer tools to model protein-DNA complexes and to predict DNA binding specificity is a very relevant research topic with significant impact in many areas.

      This article highlights the importance of transcriptional regulatory elements in gene expression regulation and the challenges in understanding their mechanisms. Traditional definitions of activating regulatory elements, such as promoters and enhancers, are becoming unclear, suggesting an updated model based on DNA accessibility and enhancer/promoter potential. Experimental techniques can assess the sequence preferences of transcription factors (TFs) for binding sites. Recent models propose a cooperative model in which regulatory elements work together to increase the local concentrations of TFs, RNA polymerase II, and other co-factors. Co-operative binding can be mediated through protein-protein or DNA interactions. The authors developed a structurebased learning approach to predict TF binding features and model the regulatory complex(es) in cis-regulatory modules, integrating experimental knowledge of structures of TF-DNA complexes and high-throughput TF-DNA interactions. They developed a server to characterize and model the binding specificity of a TF sequence or its structure, which was applied to the examples of interferon-β enhanceosome and the complex of factors SOX11/SOX2 and OCT4 with the nucleosome. The models highlight the co-operativity of TFs and suggest a potential role for nucleosome opening.

      The results presented by the authors have a large variability in performance upon the different TF families tested. Therefore, it would be ideal if the performance/accuracy of the method is tested in some simple predictions and validated with prospective experimental data before applying it to model difficult scenarios such as those described here: SOX11/SOX2/OCT4 and nucleosome or interferon beta and enhanceosome. This will give more support to the models generated and thus the validity of the conclusions and hypothesis derived from them.

      General comment: We wish to thank the reviewer for his/her comments, we really appreciate them and the opportunity to have new tests with our approach. Some of his/her comments coincide with those of reviewer 1. When this is the case, we will refer to our previous answers and modifications in the manuscript. In this revision we have included new tests to validate the approach using available and published experiments different than the ones used in the original submission. We hope the new information is sufficient to support our approach.

    1. Author Response

      Reviewer #1 (Public Review):

      Davies et al. examined the role of the malaria parasite's FIKK4.1 protein kinase in trafficking and host membrane insertion of key proteins that are exported by the intracellular P. falciparum parasite. FIKK4.1 is one of 18 FIKK serine/threonine kinases exported into the host erythrocyte; these kinases phosphorylate both host proteins and exported parasite proteins. FIKK4.1 has previously been implicated in rigidification of the erythrocyte cytoskeleton. It is also known to affect trafficking and insertion of PfEMP1, the parasite's primary cytoadherence ligand, on the host cell surface. In the present studies, the authors perform sophisticated gene-editing experiments that combine conditional knockout of FIKK4.1 with tagging of two kinase targets with the TurboID proximity biotin-labeling enzyme to explore phosphorylation-dependent changes in target protein localization, structure, or protein-protein interactions. Using conditional knockout of each exported FIKK kinase, they determine that FIKK4.1 is the only kinase that regulates PfEMP1 surface exposure and that it does not appear to modulate surface translocation of RIFINs, a family of parasite antigens involved in immune evasion. The combination of gene-editing, proximity labeling and mass spectrometry, and biochemical studies in the paper is to be lauded. These findings identify key targets of exported kinases and will guide future studies of host cell remodeling.

      Key limitations of the study:

      1) TurboID tagging of FIKK4.1 followed by proximity labeling and mass spectrometry of biotinylated proteins revealed parasite-stage dependent labeling of 101 parasite proteins and 39 human proteins that come in contact with FIKK4.1. Although TurboID is a more efficient biotin ligase produced through directed evolution, nonspecific biotinylation of proteins that do not form biologically relevant interactions remains an issue. Biotin addition for 4 hours, as used here and in most studies using this ligase, allows for labeling of proteins that undergo random collisions with the TurboID-tagged protein. While there was clear enrichment of exported proteins in the FIKK4.1-tagged parasite at mature schizont stages when FIKK4.1 is in the host cytosol, only 66% of the proteins labeled were exported, consistent with labeling and recovery of irrelevant proteins. As the authors performed appropriate controls and interpreted their findings cautiously, this limitation results primarily from finite efficiency of TurboID, trace levels of endogenous biotin within cells, and other complexities associated with the technology.

      We agree with the reviewer that there are limitations to TurboID and the mere presence of a protein in a dataset does not imply functional relevance (which is also true for IP data). However, it is highly complementary to data obtained through other methods (in our case previous cytoadhesion data and phosphoproteome data) and as we show here, can give high resolution information on the local protein environment of a protein. This is illustrated by highly significant protein-specific interaction datasets for PTP4 and KAHRP obtained from biological triplicate experiments. The site-specific protocol we use later in the paper allows us to eliminate unbiotinylated proteins non-specifically binding to beads which is a major advantage, evidenced by the much higher ratio of exported proteins observed in the PTP4 and KAHRP-turboID datasets.

      2) The production of dual-edited parasites carrying conditional knockout of FIKK4.1 and TurboID tagging of either KAHRP or PTP4 permitted examination of changes in localization of exported proteins upon their phosphorylation by FIKK4.1. KAHRP and PTP4 are excellent choices for these experiments because they are established targets of the kinase and good candidates for effectors involved in PfEMP1 membrane insertion. Some 30-40 proteins exhibited significant changes in biotinylation by these TurboID-tagged proteins, suggesting altered localization or structure upon loss of FIKK4.1 kinase activity. PfEMP1 trafficking proteins (PTPs), Maurer's cleft proteins, exported heat shock proteins, and components of PSAC, a parasite-associated nutrient uptake channel, all exhibited changes. Although FIKK4.1 is not essential for in vitro parasite propagation, altered localization could result either directly from changes in phosphorylation status of the protein itself or could reflect indirect effects on the cell from loss of FIKK4.1.

      The reviewer is correct in that we cannot exclude that it is not only loss of FIKK4.1 mediated phosphorylation sites that leads to the observed changes, but that the loss of the FIKK4.1 kinase domain affects the localisation of other proteins. Conditional inactivation of the FIKK4.1 kinase domain while retaining the overall protein would have been a more elegant approach. However, we do not predict the kinase domain of FIKK4.1 to be a strong structural component given that kinase domains often evolved to have low affinity interactions with their multiple targets and are less likely to act as scaffolding parts. As the reviewer points out, because we observed no growth defect upon deletion of FIKK4.1. Therefore we can be quite certain that the observed changes are not due to indirect effects caused by differences in growth but are a direct effect by the loss of the kinase domain and FIKK4.1’s enzymatic activity.

      3) As a consequence of these two limitations, these experiments could not conclusively implicate either KAHRP or a specific PTP in PfEMP1 surface translocation. Whether specific Maurer's cleft proteins or the nutrient channel components contribute to PfEMP1 surface translocation could also not be addressed. The authors' Discussion section is appropriately cautious in interpreting changes in biotinylation upon FIKK4.1 disruption. Although a large amount of data has been generated in this sophisticated study, the precise mechanism of PfEPM1 trafficking and membrane insertion remains elusive.

      We agree with the reviewer that we do not definitively explain the mechanism of FIKK4.1 in PfEMP1 surface translocation. But we identify several promising candidates for modulating its effect, some of which (for example PTP4) have previously shown to be relevant for PfEMP1 surface translocation. We also identify unexpected proteins which can now be investigated further. New methods in high resolution Cryo-EM imaging may allow us to image individual protein density in knobs and visualize the observed changes in the future. Further PerTurboID experiments with individual components will likely draw an ever finer picture. Here we focus on emphasising the potential of PerTurboID for identifying connections between proteins, and to observe changes to protein characteristics which would be missed by other techniques.

      Reviewer #2 (Public Review):

      Davies et al combine TurboID with conditional mutagenesis to reveal how a perturbing event alters the accessibility of a sub-cellular proteome to proximity biotinylation. The approach builds on established techniques for antibody-mediated enrichment of biotinylated peptides (rather than purification of whole biotinylated proteins by avidin) to enable mapping of the specific lysines that are biotinylated by TurboID and how access to these sites changes between conditions. The insights gained have a range of potential implications touching on protein trafficking/localization, complex dynamics and membrane topology. The authors apply this strategy to study trafficking of the key P. falciparum adhesin PfEMP1 to the infected erythrocyte surface. This group has previously shown that the exported parasite kinase FIKK4.1 is important for this process but the specific mechanism is unknown. In the first part of the present study, the authors develop PerTurboID and analyze the altered biotinylation patterns upon FIKK4.1 deletion in parasite lines bearing TurboID tags on PTP4 or KAHRP, two proteins required for this pathway and likely direct substrates of FIKK4.1. Numerous changes in site-specific biotinylation are quantitatively assessed on hundreds of proteins and possible implications for these changes are discussed, including topology of parasite integral membrane proteins exported into the RBC compartment as well as how the conformation of the RhopH complex might be altered upon RBC membrane integration. In a final set of experiments, the authors show that among 18 exported FIKK kinases, FIKK4.1 is uniquely important to PfEMP1 surface display but not to the distinct RIFIN class of parasite proteins that are also trafficked to the RBC surface. On the whole, the data are compelling and provide an important new approach that advances the proximity labeling toolkit.

      While the resolution of PerTurboID captures the site-specific changes in biotinylation abundance and position that occur upon loss of FIKK4.1, a limitation of the study is that these observations do not necessarily clarify the model for how FIKK4.1 is controlling the PfEMP1 trafficking pathway. The authors convincingly show that FIKK4.1 uniquely supports PfEMP1 surface presentation and cytoadhesion. However, this is not connected to the PerTurboID data in a way that provides a mechanism for how this is achieved by FIKK4.1 activity and in my opinion doesn't deliver on the title claim to "reveal the impact of kinase deletion on cytoadhesion". Certainly the changes in biotinylation suggest a range of interesting possibilities related to the accessibility and topology of proteins within and beyond the PfEMP1 trafficking pathway; however, it is hard to interpret the relationship of these changes to the process in view. For instance, deletion of FIKK4.1 increases biotinylation of several Maurer's clefts proteins in both the PTP4- and KAHRP-TurboID experiments but why this is or whether it is significant for PfEMP1 transport is unclear.

      We agree with the reviewer that we do not definitively confirm the relationship between the changes observed in protein accessibility and the role of FIKK4.1 in PfEMP1 transport. We discuss a number of likely options based on what is known of the candidate genes, but validation would require extensive further work beyond the scope of this paper. We have focussed on demonstrating the value of PerTurboID as a technique for measuring molecular-level changes which would be missed by other methods, providing a list of proteins which are likely involved in modulating FIKK4.1 activity and PfEMP1 trafficking through an interconnected network. We believe the technique will be very useful for understanding gene function in other scenarios. However, we changed the title to be more specific to proteins in the cytoadhesion complex and associated proteins, and not cytoadhesion per se.

    1. Author Response

      Reviewer #1 (Public Review):

      The finding that taste memory formation follows the same or highly similar logic and mechanisms as olfactory memory is very interesting. In particular, the new approach to use an operant learning assay developed by the authors to address this outstanding question in the field is very impressive. The shown data are of high quality and very convincing.

      While the current version will be of clear interest to fly people dissecting memory formation, it might be less accessible outside this immediate field. Below I list my suggestions, questions and criticisms.

      You have developed an operant assay and stress this in the introduction. This is important because it allows you to gain much better inside into how memory is formed and how it is recalled. Nevertheless, I was somewhat disappointed that you did not exploit that aspect more in your study. First, I suggest showing, at least for the initial figures, the traces (e.g. Fig 1D) not only for the test phase but also for the training phase. As you also mention in your discussion, the extent of memory formation will depend critically on the number of pairings during training. And perhaps not only on their number but also on their evolution/change over time. Second, you only show preference indices. I suggest showing the number of actual interactions with the food source in addition. In my opinion and experience, the preference index can be misleading or at least the interpretation might be questioned if the number of actual choices is very low or very high compared to controls or other groups. Third, regarding the same point, you show traces for test phases, but you do not comment or discuss why they might look the way they look. For instance, it appears that in some cases it takes a while to see an actual difference in the preference index while at other times it seems more instantaneously etc.

      We have now added plots showing the preference indices over time during both training and testing for all the experiments in Figures 1 and 2. We also comment in the text on our view of their interpretation. Although we recognize that interesting features of the learning process could be revealed by examining the process over time, we also caution that earlier timepoints are inherently less robust because of smaller sample size to the measurements (flies tend to not take many sips of the food over the first several minutes). Thus, emergence of a preference after a period of time may not reflect an evolution of the preference as much as a firming up of the data as more sips are recorded. As a notable example, our data in Figure 1E,G show close to a zero preference for activation of sweet sensory neurons during the first 10 minutes of training, despite the innately appetitive nature of this manipulation. This is undoubtedly because it takes some time for flies to sample both choices and build up enough interactions to show a clear preference. This is not to say that the curves are never informative, however. For example, it is reassuring to see that activation of PAM neurons does not produce a positive preference at any time during training (Figure 2F).

      We have also added the raw sip/interaction numbers for the experiments in Figure 1 in order to provide an example of how these data relate to the preference. Your concern about reliability differing depending on choice number is certainly warranted (as we also discuss above). However, the raw data does not suggest a major difference in the overall number of choices being made between groups.

      Along the same lines, I am wondering why you do not observe extinction. Frequently if the CS is re-experienced without the US over several trials, you start to see memory fade. The preference traces as well as the actual interactions might help to explain this.

      This is an interesting question, and one that we have certainly wondered about. Our assumption is that the number of exposures to the CS+ during testing is not sufficient to induce extinction. It would be interesting to run a longer testing period to see whether extinction occurs over a longer time course; however, we have not done so at this point.

      You use salt as a negative US. I suggest showing at least one experiment with bitter taste (e.g. quinine) to show how general your finding is to negative conditioning. Your optogenetic data suggests it is.

      We actually never use natural taste stimuli as the US; we only use salt as the CS+ in our appetitive learning experiments. We have revised the figures and figure legends extensively for clarity and one of the changes is to try to make it clearer what is the CS+ and CS- in each experiment.

      You analyze the role of energy state in memory formation. This is very interesting. In light of the importance of feeding state, it would be very helpful to include starvation/metabolic state information not only in the methods but also in the results section (at least briefly).

      We have now indicated in all the figure legends and in the text that flies were all food deprived for 24 hours prior to training.

      Your data convincingly shows that taste memory is formed in the mushroom body. For instance, you show that inhibition of KCs prevents the change in preference. KC inhibition was done during the entire experiment (training and test). Thus, it's important to show how KC inhibition affects (or does not) training vs. test.

      We appreciate the motivation for this suggestion and how extensively this issue has been explored in olfactory classical conditioning. We also agree that it would be interesting to perform this experiment. However, the practical logistics of doing this experiment were not possible with the constraints we were under. We unfortunately don’t currently have the means to operate the STROBE at a temperature high enough to effectively silence neurons using shibire(ts), and silencing with optogenetics is not possible with our current setup either. Thus, we will need to leave this issue unresolved for the time being.

      Along the same lines, how do you envision this memory formation to happen at the circuit level? KCs and DANs are likely activated by CS and US. It would be important to at least include a paragraph in the discussion to clarify this.

      The bulk of our characterization of this assay (including the demonstration that KCs are required) was done with 75 mM NaCl as the CS+ and optogenetic activation of PAM neurons as the US. Previous studies have shown activation of KCs by tastes (Kirkhart and Scott, 2015), so we believe that KCs are being activated by the CS+ and DANs are being activated by the US (in this case directly through optogenetics). Based on a great deal of beautiful work in olfactory classical conditioning, we believe it is likely that this co-incident activation leads to plasticity as KC-MBON synapses, thereby skewing the behaviour in favor of attraction. We have now tried to clarify this mechanism in the paper.

    1. Author Response

      We express our sincere gratitude to the editors and reviewers for their invaluable input. To further improve our manuscript, we have devised a plan to perform additional histological experiments of Bdnf and TrkB expression. Specifically, we will replace the phospho-TrkB antibody with an anti-TrkB antibody to quantify Bdnf/TrkB co-expression. Moreover, we acknowledge the concern raised by the reviewers regarding the clarity of some explanations and the potential influence of alternative mechanisms influencing the defects observed in Bdnf neurons. We aim to provide a clearer explanation and discussion. We also intend to provide a more comprehensive discussion of the limitations of our LM22A-4 drug treatment experiment. By addressing these points, we wish to ensure that our research is informative to the eLife readership.

    1. Author Response

      We would like to thank the editors and reviewers for their thoughtful comments on our manuscript. Before we can provide a point-by-point response and submit a revised version of the manuscript we would like to provisionally address and alleviate some of their main concerns.

      A concern was expressed in the ‘eLife assessment’ and by two of the reviewers that a potential confound between the coding of sensory information and behavior outcome by IC neurons might have been introduced by combining data across different sound levels, which could challenge the conclusions of the study. In addressing this we have carried out the analysis (i.e. averaging the neural activity separately for different sound levels) suggested for distinguishing between the two alternative explanations offered by reviewer #1: That the difference in neural activity between hit and miss trials reflects a) behavior or b) sound level (more precisely: differences in response magnitude arising from a higher proportion of highsound-level trials in the hit trial group than in the miss trial group). If the data favored b), we would expect no difference in activity between hit and miss trials when plotted separately for different sound levels. The figure in Author response image 1 indicates that that is not the case. Hit and miss trial activity are clearly distinct even when plotted separately for different sound levels, confirming that this difference in activity reflects the animals’ behavior rather than sensory information.

      Author response image 1.

      A related concern was expressed with regards to the decoding analysis. Namely, that differences in the distributions of sound levels in the different trial types could confound the decoding into hit and miss trials and that, consequently, the results of the decoding analysis merely reflect differences in the processing of sound level. Our analysis actually aimed to take this into account but, unfortunately, we failed to include sufficient details in the methods section of the submitted manuscript. Rather than including all the trials in a given session, only trials of intermediate difficulty were used for the decoding analysis. More specifically, we only included trials across five sound levels, comprising the lowest sound level that exceeded a d prime of 1.5 plus the two sound levels below and above that level. That ensured that differences in sound level distributions would be small, while still giving us a sufficient number of trials to perform the decoding analysis. In this context, it is worth bearing in mind that a) the decoding analysis was done on a frame-by-frame basis, meaning that the decoding score achieved early in the trial has no impact on the decoding score at later time points in the trial, b) sound-driven activity can be observed predominantly immediately after stimulus onset and is largely over about 1 s into the trial (see cluster 3, for instance, or average miss trial activity in the plots above), c) decoding performance of the behavioral outcome starts to plateau 5001000 ms into the trial and remains high until it very gradually begins to decline after about 2 s into the trial. In other words, decoding performance remains high far longer than the stimulus would be expected to have an impact on the neurons’ activity. Therefore, we would expect any residual bias due to differences in the sound level distribution that our approach did not control for to be restricted to the very beginning of the trial and not to meaningfully impact the conclusions derived from the decoding analysis.

      Another concern expressed in the reviews is that, in relation to the cluster-wise analysis of neural activity, no direct comparison (beyond the pie charts of Figure 5C) was provided between data from lesioned and non-lesioned groups, leaving unclear how similar taskrelevant activity is between these groups. In Author response image 2 we plot, analogous to Figure 5B, the average hit and miss trial activity for the 10 clusters separately for lesioned and non-lesioned mice, illustrating more clearly the high degree of similarity between the two groups.

      Author response image 2.

    1. Author Response

      Reviewer #1 (Public Review):

      Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disorder leading to the loss of innervation of skeletal muscles, caused by the dysfunction and eventual death of lower motor neurons. A variety of approaches have been taken to treat this disease. With the exception of three drugs that modestly slow progression, most therapeutics have failed to provide benefit. Replacing lost motor neurons in the spinal cord with healthy cells is plagued by a number of challenges, including the toxic environment, inhibitory cues that prevent axon outgrowth to the periphery, and proper targeting of the axons to the correct muscle groups. These challenges seem to be well beyond our current technological approaches. Avoiding these challenges altogether, Bryson et al. seek to transplant the replacement motor neurons into the peripheral nerves, closer to their targets. The current manuscript addresses some of the challenges that will need to be overcome, such as immune rejection of the allograft and optimizing maturation of the neuromuscular junction.

      Bryson et al. begin by examining the survival of mESC-derived motor neurons allografted into SOD1 mice. The motor neurons, made on a 129S1/SvImJ, were transplanted into the tibial nerve of SOD1 mice on a C57BL/6J background. Without immunosuppression, most cells were lost between 14 and 35 days, suggesting an immune response had eliminated them. Tacrolimus prevented cell loss, but it also inhibited innervation of the muscle. It also uncovered the tumorigenic potential of contaminating pluripotent cells. In contrast, immunosuppression using H57-597, an antibody targeting T-cell receptor beta, prevented graft rejection while permitting some innervation of muscle. Pretreatment of the cells with mitomycin-C eliminated pluripotent cells, preventing tumor formation. The authors noted that this combination only innervated ~10% of endplates, likely due to the fact that the implanted motor neurons are not active.

      The authors then began the process of optimizing the cells themselves, using measurements taken in late-stage SOD1 mice. Fast-firing and slow-firing populations of neurons were first compared. Using optical stimulation, these two cell types appeared to be similar. The authors opted to use slow-firing neurons in the subsequent experiments. Recognizing that neuromuscular junction (NMJ) innervation and maintenance are dependent on motor neuron activity, implantable optical stimulators were also evaluated. 14 days after transplanting the cells, optical stimulation training was initiated for one hour each day. This training led to a nearly 13-fold increase in force generation, although this still remained well below the force generated by electrical stimulation. The enhanced innervation also prevented the atrophy of muscle fibers caused by denervation.

      Overall, the data for the function of the implanted cells are convincing. The dCALMS technique that the authors have developed is quite interesting and will likely be applicable to analyze muscles for other therapeutics. The identification of calcineurin inhibitors as inhibitors of reinnervation will also be important for the development of other cell-based therapeutics for ALS.

      This is an excellent summary of the state of the field of ALS therapy development and provides a clear rationale for our novel therapeutic strategy, in the near-complete vacuum of conventional treatment options for patients suffering from this devastating disorder. We are delighted that the Reviewer clearly appreciated the value of our alternative therapeutic strategy and found our supporting data to be convincing, as well as drawing attention to the dCALMS technique, which we agree could be of significant value in the investigation of other therapeutic strategies aimed at restoring muscle innervation. We are extremely grateful for the Reviewer’s diligence in assessing our manuscript.

      However, there are some issues that should be addressed. These include some common misconceptions about ALS. While ALS is split into familial and sporadic forms based on the presence or absence of a family history of the disease, mutations in the known ALS-associated genes are found in both forms [1]. The authors also state that exercise programs are likely to accelerate degeneration in ALS. This is incorrect. Moderate exercise is part of the current guidelines for treating ALS, and mouse studies have demonstrated a therapeutic effect of moderate exercise [2]. Regarding the experimental design, there are some important details missing. The animals do not appear to have been operated on at the same age, and the criteria for when to perform the operation were not described. A similar problem exists for when the animals were determined to reach endpoint [3]. The authors also do not seem to address a potential pitfall of this approach: acceleration of the disease process. Indeed, some of the data comparing the ipsilateral side to the contralateral side suggest that the implantation of the cells and/or the light source increase the denervation of the muscle [4]. Finally, there is a fairly large difference between the motor output provided by optical stimulation relative to electrical stimulation. It is currently unclear what level needs to be reached to provide an effective response in the intact animal. Thus, it is difficult to determine if the level of reinnervation that this study has achieved will be sufficient to improve a patient's quality of life [5].

      The Reviewer raises some extremely important points and highlights some additional constructive issues where more clarity is required (numbered 1-5 above). We have attempted to address each of these points in order to strengthen the key message of our study and the integrity of our manuscript:

      1) The Reviewer is absolutely correct in highlighting that causative mutations in identified genes occur in both sporadic and familial forms of ALS and that this classification simply reflects whether or not there is a known family history of the disease (which can also encompass a spectrum of disorders including frontotemporal degeneration). We will revise our manuscript in order to be more accurate and provide clarity on this important point.

      2) Regarding the potential acceleration of muscle denervation, we specifically state that the use of electrical nerve stimulation (ENS) to artificially evoke muscle contraction has been shown to accelerate denervation of the diaphragm muscle in clinical trials aimed at maintaining respiratory function in ALS patients, which significantly shortened life-expectancy. It was not our intention to imply that moderate voluntary exercise, as opposed to artificial “ENS-based” muscle stimulation programmes, could accelerate muscle denervation. Indeed, the negative side-effects of ENS that we highlighted provide a clear rationale for developing a safer alternative to artificially control muscle function once innervation by endogenous motor neurons progressively deteriorates in ALS patients; specifically, our selection of optogenetic nerve stimulation (ONS), which is highly selective to the engrafted light-sensitive motor neurons, recruits motor units in correct physiological order and avoids rapid muscle fatigue potentially overcomes the safety concerns associated with ENS.

      Importantly, unchecked disease progression means that complete paralysis of almost all muscles will eventually occur, due to loss of upper or lower motor neurons and accompanying muscle denervation, which would eventually preclude the ability of ALS patients to undertake voluntary exercise programmes, or even activities of daily life. Our approach is aimed at overcoming this inevitable loss of voluntary muscle control and onset of complete paralysis by providing a safe and effective method of artificially maintaining control of targeted muscles that would otherwise become completely paralyzed, as well as preventing their irreversible atrophy.

      To avoid the possibility that readers may infer that we are suggesting voluntary exercise programs accelerate degeneration in ALS and to provide additional clarity, we will revise the manuscript to stress that we specifically refer to “ENS-based” exercise programmes in relation to acceleration of muscle denervation.

      3) Regarding our experimental design, the congenic B6.SOD1G93A mouse model of ALS is an extremely well-characterized model, with a highly consistent timeframe of disease phenotype manifestation and progression. In order to maximize the translational value of our study, we selected an early post-symptom onset timepoint (95d +/- 4.6 days) that mirrors a time at which human ALS patients would be likely to benefit from the therapeutic strategy: in the vast majority of cases, it is not possible to treat humans until a diagnosis of ALS has been confirmed, which can often take up to 12 months from first presentation. Importantly, ALS patients in the final stages of disease progression would be unlikely to be suitable for this therapy, due to irreversible muscle atrophy, which would preclude the ability of the engrafted motor neurons to form functionally useful connections. Indeed, our strategy is to engraft the replacement motor neurons prior to severe muscle atrophy occurs, so that they are in place to compensate and take over the function of endogenous lower motor neurons as they progressively degenerate and paralysis ensues. In so doing, the replacement motor neurons could prevent the irreversible atrophy of targeted muscles through ONS-based exercise programmes and thereby indefinitely extend the ability of targeted muscles to perform functionally useful movements.

      Although the initial graft optimization component of this study, including the tacrolimus trial, was performed across a variety of disease stages (commencing between 57-101 days of age), once we identified the H57-596 monoclonal antibody as an effective means to promote graft survival (without interfering with target muscle innervation), all subsequent grafts were initiated at an early symptom onset timepoint: 95.7 ± 4.6 days for slow-firing motor neuron grafts and 106.8 ± 7.2 days for fast-firing motor neuron grafts. Transgenic SOD1G93A mice were specifically bred for this study and due to complexities of coordinating stem cell differentiation and motor neuron production, optical stimulation device production and access to surgical facilities, with timed matings set up 3-4 months in advance, we feel that this age range was acceptable and doesn’t detract from the findings of our study.

      Similarly, we made every effort to ensure that experimental end-point was consistent, at 133 ±8 days for all grafts involving H57-597 administration, which reflects translationally-relevant late-stage disease progression. Since the physiological experiments performed as part of this study are extremely time-consuming, it was necessary to stagger the experimental end-point over several days. Again, we feel that this range is acceptable and still reflects a consistent, translationally-relevant timepoint. Importantly, since the experimental paradigm tested in this study was aimed at individually targeted muscles, which would have been unlikely to have an effect on disease duration or survival, we did not feel that it was ethically justifiable to allow the B6.SOD1G93A mice to approach end-stage disease (which occurs at an average age of 150 days of age in this model).

      In the interests of full transparency, the age at which treatment commenced and the experimental end-point for every animal used in this study is reported in Supplementary Tables 2 and 3.

      4) The Reviewer raises an extremely pertinent question, regarding whether the engrafted motor neurons themselves, or the implanted stimulation device, may accelerate the progressive loss of innervation of targeted muscles by endogenous motor neurons, in light of our data that shows decreased force evoked by electrical stimulation of ipsilateral (engrafted) versus contralateral (control) muscles. It is worth noting that supramaximal electrical nerve stimulation, used to evoke maximal muscle force, should activate both endogenous and engrafted motor neurons, therefore the combined activation of both populations would be expected to result in a summative (greater) contractile response. The fact that we see the converse is unlikely to be due to an accelerated loss of endogenous motor innervation as a result of the engrafted cells, but is much more likely to be caused by physical nerve damage during the surgical engraftment process: we used a customized Hamilton syringe with a 29G needle to manually inject the cells into the targeted nerve branches, which has an outer diameter of 330μm whilst the diameter of the tibial nerve in an adult mouse is approximately 400μm. This is likely to have led to damage of the endogenous motor (and potentially sensory) axons that may have diminished regenerative capacity due to ongoing disease mechanisms. Fortunately, there is significant scope to refine the engraftment procedure by using smaller gauge needles (potentially made of more flexible materials), bespoke injection systems that can deliver the cells at a controlled rate and micromanipulators that avoid can avoid nerve damage caused by excessive movement of the needle within the nerve. Importantly, the significantly greater scale of human nerves, compared to murine nerves targeted in this study, would also be a significant advantage in terms of physically delivering the cells in ALS patients.

      5) The Reviewer’s final comment is entirely justified given that, even in the best cases following optical stimulation training of engrafted SOD1G93A mice, optical stimulation still evoked less contractile force than supramaximal electrical stimulation. The likely reasons for this are complex: there is almost certainly scope to further optimize the optical stimulation training paradigm, which could result in reinforcement of the de novo neuromuscular junctions formed between the engrafted motor neurons and targeted muscle fibres; it is possible that the expression level of the channelrhodopsin-2 protein at the cell surface may require optimization in order to reliably initiate action potentials in the engrafted motor neurons – development of newer channelrhodopsin variants may resolve this potential issue, whilst providing additional advantages (such as enabling transcutaneous stimulation) at the same time. Finally, the maximum contractile response of the triceps surae muscle elicited by optical stimulation that we observed was approximately 13g, which equates to approximately 50% of the body mass of an adult SOD1G93A mouse. Although this is only approximately 10% of the maximal contractile force of a wild-type triceps surae muscle, this would almost certainly provide the ability to perform functionally useful motor tasks if it could be reproduced in ALS patients, particularly if large numbers of targeted muscles could be controlled in a coordinated manner, something that we are actively working on.

      Reviewer #2 (Public Review):

      The authors provide convincing evidence that optogenetic stimulation of ChR2-expressing motor neurons implanted in muscles effectively restores innervation of severely affected skeletal muscles in the aggressive SOD1 mouse model of ALS, and conclude that this method can be applied to selectively control the function of implicated muscles. This was supported by convincing data presented in the paper.

      This is an interesting paper providing new/improved optogenetic methods to restore or improve muscle strength in ALS. In general, it is of high significance in both the techniques and concept, and the paper was well written. The evidence supporting the conclusions is convincing, with rigorous muscle tension physiological analysis, and nerve and muscle histology and image analysis. The work will be of broad interest to medical biologists on muscle disorders.

      One weak point is that proper control experiments were not clearly presented - these could be shown in the paper. For example, one control experiment with only YFP but no ChR2 expression with optogenetic stimulation should be performed, following similar procedures and analysis applied to the ChR2-transduced animals.

      We are extremely grateful for the Reviewer’s expert appraisal of our manuscript and we are delighted to hear that they found our study to be highly significant, of broad interest and that our supporting evidence for this novel therapeutic approach was convincing and rigorous.

      Regarding the inclusion of suggested control experiments, we have extensive negative results data from physiological recordings of muscles in response to optical stimulation in animals where the engrafted motor neurons were rejected (prior to our identification of a 100% effective immunosuppression regimen). This clearly revealed that, in the absence of ChR2-expressing motor neurons, optical stimulation does not elicit any response from the target muscle. However, we do not feel that inclusion of this negative data, which is entirely predictable, would have strengthened the findings of our study. Similarly, if we had engrafted motor neurons that only express YFP, we would have been unable to elicit any muscle contractile activity in response to optical stimulation. As a control, this may have some value in determining the ability of motor neurons derived from other cell lines that do not express ChR2 to survive and innervate target muscles but we don’t feel that the additional work would get us closer to achieving our ultimate goal of using motor neuron replacement in combination with optogenetic stimulation to restore/maintain muscle function in ALS patients. Moreover, the complex and iterative process of developing the cell line used in this study (reported in detail in our previous study) would make it extremely difficult to produce a suitable control stem cell line expressing only YFP. Having said that, we are actively in the process of developing new, more sophisticated human and mouse stem cell lines, using more translationally-relevant gene targeting methods to stably knock-in a variety of updated channelrhodopsin variants that may have superior properties for our approach. This will be reported in follow up study/studies as we feel that it goes well beyond the scope of the current study.

    1. Author Response

      We appreciate very much your positive assessment and the comments of the two reviewers, all of which will greatly help us to improve our manuscript. In response, therefore, to these constructive comments we will take pleasure in submitting a revised manuscript during the next step of publication.

      We take the opportunity to provide a provisional author response.

      As for Reviewer #1.

      We thank Reviewer 1 very much for her/his very positive and detailed remarks, all of which will be introduced into the revised version of our manuscript.

      We will add the information about the biological control on the development of phosphatic-shelled brachiopod columns in the introduction so that our late narrative can be more understandable. The Cambrian Explosion is the innovation of metazoan body plans and radiation of animals during a relatively short geological time. The expansion of new body plans in different groups of brachiopods in early Cambrian was likely driven by the Cambrian Explosion. The columnar shell structures are not developed in living lingulate brachiopods, and thus it is important to get a better understanding of this extinct shell architecture from the fossil records in order to study the evolutionary trend of shell structures and compositions in brachiopods. Furthermore, the adaptive innovation of biomineralized columns in early brachiopod will be discussed in the revised manuscript.

      As for Reviewer #2.

      We thank Reviewer 2 very much for her/his very constructive and detailed remarks. All the comments have been thoroughly considered, and most of them will be introduced into the revised version of the manuscript.

      We agree that the knowledge is incomplete on the shell structures of early linguliform brachiopods and more research shall be helpful. We also express the idea in the first part of our manuscript that the shell structural complexity and diversity of linguliform brachiopods (especially their fossil representatives) require further studies. As the shell structure and biomineralization process are crucial to unravel the poorly resolved phylogeny and early evolution of Brachiopoda, in this paper, we undertake a primary study of exquisitely well-preserved brachiopods from the Cambrian Series 2. The morphologies, shapes and sizes of cylindrical columns are described in details in this research, and this work will be useful for further comparative studies. We are very sorry to miss the important reference paper on brachiopod shells by Butler et al. (2015), which will be added into the revised manuscript. The structure and language of the manuscript will be revised based on the very helpful suggestions.

      Concerning the families Eoobolidae and Lingulellotretidae, we are aware of the current problematic situation of these families, and we will add more discussion about the detailed characters of Eoobolidae in the Systematic Palaeontology part of the manuscript. However, the revision of the families Eoobolidae and Lingulellotretidae falls outside the scope of this paper. We prefer to leave it now as it will be part of an upcoming publication based on more global materials from China, Australia, Sweden and Estonia that we are currently working on.

    1. Author Response

      We thank the reviewers for thoroughly evaluating our work and for providing constructive and actionable feedback to improve the manuscript. The reviews have left us with a clear direction in which our work can improve, for which we are grateful. We will provide a detailed response to the reviews together with our revised manuscript. At this time, we accept the invitation to provide a provisional reply that addresses the major themes as summarized by the editors.

      The goal of our study was to infer an individual’s control strategy from the details of kinematics. We did this using monkey and human data collected under matching experimental conditions. We quantitatively compared these data to simulations that were generated by adapting a reasonable model of sensorimotor function that is standard in the literature. We are pleased that the reviewers and editors felt “that the overall scientific approach is of interest and has scientific merit” and “the approach has promise in aiding future studies that try to link behavior and neurophysiology (allowing homology between humans and primates).”

      We agree with the reviewers that additional work is needed to corroborate our main claim that we can unambiguously infer control strategies from behavioral data. This is a known hard problem that we are not the first to address, and we do not claim to have solved it here. We appreciate the suggestions about (1) further testing the classification procedure, (2) considering other metrics that may better distinguish between the control strategies, and (3) investigating the control strategy under perturbation scenarios. We plan to undertake additional simulations, analyses and, in the future, experiments, as suggested by the reviewers to enhance the impact of our work.

      In this initial brief response, we wish to focus on one key point noted by the editors, stemming from simulations by one of the reviewers using “a simple fixed controller.” We greatly appreciate that one reviewer went as far as to perform their own simulations. These simulations suggested that subjects do not need to switch between control strategies, but rather could achieve similar behavioral results via “a modest change in gain.” Specifically, the reviewer reports that their simple fixed controller could generate trials that sometimes looked like what we would call position control and sometimes looked like what we would characterize as velocity control. It was noted that “trial-to-trial differences were driven both by motor noise and by the modest variability in gain.”

      While we cannot comment with great certainty on the reviewer’s simulation results, since we do not know the specifics, we first wish to note that our controller and experimental subjects demonstrated this same phenomenon, in that there was overlap in the distribution of the metrics for the two strategies (specifically, in Figs. 5, 7 & 8). Hence, in our findings, even under position control some trials looked more like velocity control, and vice versa. We briefly discussed this in the paper, noting that “a large number of trials fall somewhere between the Position and Velocity Control boundaries”, and that “this could be due to a mixed control strategy” or “subjects switch strategies of their own accord”. This point would have been clearer had we included examples of these hand and cursor traces in Fig. 8. We will update Fig. 8 to more clearly illustrate this point and expand our discussion on different possible interpretations.

      Second, one may interpret the differences we attributed to changes in “control strategy” as changes simply in the gain of our “fixed” controller. Specifically, similar to the controller implemented by the reviewer, our controller is fixed in terms of the plant, the actuator and the sensory feedback loop; the only change we explored was in the relative weights or gains of position vs. velocity in the Q matrix to generate the motor command. While our intent was primarily to focus on the extremes of position control vs. velocity control, we agree that a mixed strategy of minimizing some combined error in position and velocity is likely. This is something we can readily explore with our controller model.

      In summary, we consider it worthwhile to investigate how one can infer the control strategy that a subject is employing to complete the task - either in our CST, or any other task that admits multiple strategies that can lead to success. We regard this as a valuable step towards addressing more realistic behaviors and their neural underpinnings in non-human primate research. The suggestions offered by the reviewers regarding additional analyses, simulations and experiments will provide more definitive answers and clarity for our approach.

      We are truly grateful for the time and effort the reviewers put into our manuscript. We are in the process of undertaking revisions to address all of their feedback and look forward to submitting an improved manuscript with a more detailed reply in the coming weeks.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper falls in a long tradition of studies on the costs of reproduction in birds and its contribution to understanding individual variation in life histories. Unfortunately, the meta-analyses only confirm what we know already, and the simulations based on the outcome of the meta-analysis have shortcomings that prevent the inferences on optimal clutch size, in contrast to the claims made in the paper.

      There was no information that I could find on the effect sizes used in the meta-analyses other than a figure listing the species included. In fact, there is more information on studies that were not included. This made it impossible to evaluate the data-set. This is a serious omission, because it is not uncommon for there to be serious errors in meta-analysis data sets. Moreover, in the long run the main contribution of a meta-analysis is to build a data set that can be included in further studies.

      It is disappointing that two referees comment on data availability, as we supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      The main finding of the meta-analysis of the brood size manipulation studies is that the survival costs of enlarging brood size are modest, as previously reported by Santos & Nakagawa on what I suspect to be mostly the same data set.

      We disagree that the main finding of our paper is the small survival cost of manipulated brood size. The major finding of the paper, in our opinion, is that the effect sizes for experimental and observational studies are in opposite directions, therefore providing the first quantitative evidence to support the influential theoretical framework put forward by van Noordwijk and de Jong (1986), that individuals differ in their optimal clutch size and are constrained to reproducing at this level due to a trade-off with survival. We show that while the manipulation experiments have been widely accepted to be informative, they are not in fact an effective test of whether within-species variation in clutch size is the result of a trade-off between reproduction and survival.

      The comment that we are reporting the same finding as Santos & Nakagawa (2012) is a misrepresentation of both that study and our own. Santos & Nakagawa found an effect of parental effort on survival only in males who had their clutch size increased – but no effect for males who had their clutch size reduced and no survival effect on females for either increasing or reducing parental effort. However, we found an overall reduction in survival for birds who had brood sizes manipulated to make them larger (for both sexes and mixed sex studies combined). In our supplementary information, we demonstrate the overall survival effect of a change in reproductive effort to be close to zero for males, negative (though non-significant) for females and significantly negative for mixed sexes (which are not included in the Santos & Nakagawa study).

      The paper does a very poor job of critically discussing whether we should take this at face value or whether instead there may be short-comings in the general experimental approach. A major reason why survival cost estimates are barely significantly different from zero may well be that parents do not fully adjust their parental effort to the manipulated brood size, either because of time/energy constraints, because it is too costly and therefore not optimal, or because parents do not register increased offspring needs. Whatever the reason, as a consequence, there is usually a strong effect of brood size manipulation on offspring growth and thereby presumably their fitness prospects. In the simulations (Fig.4), the consequences of the survival costs of reproduction for optimal clutch size were investigated without considering brood size manipulation effects on the offspring. Effects on offspring are briefly acknowledged in the discussion, but otherwise ignored. Assuming that the survival costs of reproduction are indeed difficult to discern because the offspring bear the brunt of the increase in brood size, a simulation that ignores the latter effect is unlikely to yield any insight in optimal clutch size. It is not clear therefore what we learn from these calculations.

      The reviewer’s comment is somewhat of a paradox. We take the best studied example of the trade-off between reproductive effort and parental survival, a key theme in life-history and the biology of ageing, and subject this to a meta-analysis. The reviewer suggests we should interpret our finding as if there must be something wrong with the method or studies we included, rather than maybe considering the original hypothesis could be false or inflated in importance. The reviewer’s inclination to question the premise of the data in favor of a held hypothesis we consider not necessarily the best scientific approach here. In many places in our manuscript do we question and address issues in the underlying data and interpretation (L101-105, L149-150, 182-185 and L229-233). Moreover, we make it clear that we focus on the trade-off between current reproductive effort and subsequent parental survival and we are aware that other trade-offs could counter-balance or explain our findings, discussed on L189-191 & L246-253. Note that it is also problematic, when you do not find the expected response, to search for an alternative that has not been measured. In the case here, with trade-offs, there are endless possiblilities of where a trade-off might be incurred between traits. We purposfully focus on the one well-studied and theorised trade-off. We clearly acknowledge though that when all possible trade-offs are taken into account a trade-off on the fitness level can occur and cite two famous studies (Daan et al., 1990 and Verhulst & Tinbergen 1991) that have done just that (L250-253).

      So whilst, we agree with the reviewer that the offspring may incur costs themselves, rather than costs being incurred by the parents, the aim of our study was to test for a generalised trend across species in the survival costs of reproductive effort. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest.

      There are other reasons why brood size manipulations may not reveal the costs of reproduction animals would incur when opting for a larger brood size than they produced spontaneously themselves. Firstly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Secondly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      First, our results did show a survival cost of reproduction for brood manipulations. We agree that there could be longer-term costs, and so our estimate of the survival cost for manipulated birds is likely to be an underestimate, meaning that our interpretation still holds – the cost to reproduce prevents individuals from laying beyond their optimal level. Note, however, that much theory is build on the immediate costs of reproduction and as such these costs are likely overinterpreted.

      We agree with the reviewer that lifetime manipulations could be even more informative than single-year manipulations. Unfortunately, there are currently too few studies available to be able to draw generalisable conclusions across species for lifetime manipulations. This is, however, the reason we used lifetime change in clutch size in our fitness projections, which the reviewer seems to have missed – please see methods line 360-362, where we explicitly state that this is lifetime enlargement. Of course such interpretations do not include an accumulation of costs that is greater than the annual cost, but currently there is no clear evidence that such an assumption is valid. Such a conclusion can also not be drawn from the study on jackdaws by Boonekamp et al (2014) as the treatments were life-long and, therefore, cannot separate annual from accrued (multiplicative) costs that are more than the sum of annual costs incurred.

      Details of how the analyses were carried out were opaque in places, but as I understood the analysis of the brood size manipulation studies, manipulation was coded as a covariate, with negative values for brood size reductions and positive values for brood size enlargements (and then variably scaled or not to control brood or clutch size). This approach implicitly assumes that the trade-off between current brood size (manipulation) and parental survival is linear, which contrasts with the general expectation that this trade-off is not linear. This assumption reduces the value of the analysis, and contrasts with the approach of Santos & Nakagawa.

      We thank the reviewer for highlighting a lack of clarity in places in our methods. We will add additional detail to this section in our revised manuscript.

      For clarity in our response, each effect size was extracted by performing a logistic regression with survival as a binary response variable and clutch size was the absolute value of offspring in the nest (i.e., for a bird who laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). The clutch size was also standardised and, separately, expressed as a proportion of the species mean.

      We disagree that our approach reduces the value of our analysis. First, our approach allows a direct comparison between experimental and observational studies, which is the novelty of our study. Our approach does differ from Santos & Nakagawa but we disagree that it contrasts. Our approach allows us to take into consideration the severity of the change in clutch size, which Santos & Nakagawa do not. Therefore, we do not agree that our approach is worse at accounting for non-linearity of trade-offs than the approach used by Santos & Nakagawa.

      Our analysis, alongside a plethora of other ecological studies, does assume that the response to our predictor variable is linear. However, it is common knowledge that there are very few (if any) truly linear relationships. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets there is not a range of chicks added for which a non-linear relationship could be estimated. The question also remains of what the shape of this non-linear relationship should be and is hard to determine a priori. We will address non-linear effects in our revised manuscript.

      The observational study selection is not complete and apparently no attempt was made to make it complete. This is a missed opportunity - it would be interesting to learn more about interspecific variation in the association between natural variation in clutch size and parental survival.

      We clearly state in our manuscript that we deliberately made a tailored selection of studies that matched the manipulation studies (L279-282). We paired species extracted for observational studies with those extracted in experimental studies to facilitate a direct comparison between observational and experimental studies, and to ensure that the respective datasets were comparable. The reviewer’s focus in this review seems to be solely on the experimental dataset. This comment dismisses the observational component of our analysis and thereby fails to acknowledge the question being addressed in this study.

      Reviewer #2 (Public Review):

      I have read with great interest the manuscript entitled "The optimal clutch size revisited: separating individual quality from the costs of reproduction" by LA Winder and colleagues. The paper consists in a meta-analysis comparing survival rates from studies providing clutch sizes of species that are unmanipulated and from studies where the clutch sizes are manipulated, in order to better understand the effects of differences in individual quality and of the costs of reproduction. I find the idea of the manuscript very interesting. However, I am not sure the methodology used allows to reach the conclusions provided by the authors (mainly that there is no cost of reproduction, and that the entire variation in clutch size among individuals of a population is driven by "individual quality").

      We would like to highlight that we do not conclude that there is no cost of reproduction. Please see lines 258–260, where we state that our lack of evidence for trade-offs driving within-species variation in clutch size does not necessarily mean the costs of reproduction are non-existent. We conclude that individuals are constrained to their optima by the survival cost of reproduction. It is also an over-statement of our conclusion to say that we believe that variation in clutch size is only driven by quality. Our results show that unmanipulated birds who have larger clutch sizes also live longer, and we suggest this is evidence that some individuals are “better” than others, but we do not say, nor imply, that no other factors affect variation in clutch size.

      I write that I am not sure, because in its current form, the manuscript does not contain a single equation, making it impossible to assess. It would need at least a set of mathematical descriptions for the statistical analysis and for the mechanistic model that the authors infer from it.

      We appreciate this comment, but this is the first time we have been asked to put equations in a manuscript rather than explain them in terms that are accessible to a wider audience. Note however that our meta-analysis is standard and based on logistic regression and standard meta-analytic practices. We do not think we need to repeat such equations and we cite the relevant data. For the simulation, we simply simulated the resulting effects and this is not something that we feel is captured more accurately in equations rather than in text and the associated graphs. We of course supplied our code for this along with our manuscript (https://doi.org/10.5061/dryad.q83bk3jnk), though as we mentioned above, we believe this was not shared with the reviewers despite us making this available for the review process. We therefore understand the reviewer feels the simulations were not explained thoroughly. We will revise our text to see if we can add additional explanation where relevant in our revision.

      The texts mixes concepts of individual vs population statistics, of within individual vs among-individuals measures, of allocation trade-offs and fitness trade-offs, etc ....which means it would also require a glossary of the definitions the authors use for these various terms, in order to be evaluated.

      We would like to thank the reviewer for highlighting this lack of clarity in our text. We will simplify the terminology and define terms in our revised manuscript.

      This problem is emphasised by the following sentence to be found in the discussion "The effect of birds having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation". The "effect" is defined as the survival rate (see Fig 1). While it is relatively easy to intuitively understand what the "effect" is for the unmanipulated studies: the sensitivity of survival to clutch size at the population level, this should be mentioned and detailed in a formula. Moreover, the concept of effect size is not at all obvious for the manipulated ones (effect of the manipulation? or survival rate whatever the manipulation (then how could it measure a trade-off ?)? at the population level? at the individual level ?) despite a whole appendix dedicated to it. This absolutely needs to be described properly in the manuscript.

      We would like to thank the reviewer for bringing to our attention the lack of clarity on the details of our methodology. We will make this more clear in our revised manuscript.

      For clarity, the effect size for both manipulated and unmanipulated nests was survival, given the brood size raised. We performed a logistic regression with survival as a binary response variable (i.e., number of individuals that survived and number of individuals that died after each breeding season), and clutch size was the absolute value of offspring in the nest (i.e., for a bird who laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). This allows for direct comparison of the effect size (survival given clutch size raised) between manipulated and unmanipulated birds.

      Despite the lack of information about the underlying mechanistic model tested and the statistical model used, my impression is still that the interpretation in the introduction and discussion is not granted by the outputs of the figures and tables. Let's use a model similar to that of (van Noordwijk and de Jong, 1986): imagine that the mechanism at the population level is

      a.c_(i,q)+b.s_(i,q)=E_q

      Where c_(i,q) are s_(i,q) are respectively the clutch size for individual i which is of quality q, and E_q is the level of "energy" that an individual of quality q has available during the given time-step (and a and b are constants turning the clutch size and survival rate into energy cost of reproduction and energy cost of survival, and there are both quite "high" so that an extra egg (c_(i,q) is increased by 1) at the current time-step, decreases s_(i,q) markedly (E_q is independent of the number of eggs produced), that is, we have strong individual costs of reproduction). Imagine now that the variance of c_(i,q) (when the population is not manipulated) among individuals of the same quality group, is very small (and therefore the variance of s_(i,q) is very small also) and that the expectation of both are proportional to E_q. Then, in the unmanipulated population, the variance in clutch size is mainly due to the variance in quality. And therefore, the larger the clutch size c_(i,q) the higher E_q, and the higher the survival s_(i,q).

      In the manipulated populations however, because of the large a and b, an artificial increase in clutch size, for a given E_q, will lead to a lower survival s_(i,q). And the "effect size" at the population level may vary according to a,b and the variances mentioned above. In other words, the costs of reproduction may be strong, but be hidden by the data, when there is variance in quality; however there are actually strong costs of reproduction (so strong actually that they are deterministic and that the probability to survive is a direct function of the number of eggs produced)

      We would like to thank the reviewer for these comments. Please note that our simulations only take the experimental effect of brood size on parental survival into account. Our model does not incorporate quality effects. The reviewer is right that the relationship between quality and the effects exposed by manipulating brood size can take many forms and this is a very interesting topic, but not one we aimed to tackle in our manuscript. In terms of quality we make two points: 1) overall quality effects connecting reproduction and parental survival are present 2) these effects are opposite in direction to the effects when reproduction is manipulated and similar in magnitude. We do not go further than that in interpreting our results. The reviewer is right however that we do suggest and repeat suggestions by others that quality can also mask the trade-off in some individuals or circumstances (L63-65, L85-88 & L237-240), but we do not quantify this as this is dependent on the unknown relationships between quality and the response to the manipulation. A focussed set of experiments in that context would be interesting and there is some data that could get at this, i.e. the relationship between produced clutch size and the relative effect of the manipulation. Such information is however not available for all studies and although we explored also analyzing this, currently this is not possible to do with sufficient confidence. We will include this rationale in our revision.

      Moreover, it seems to me that the costs of reproduction are a concept closely related to generation time. Looking beyond the individual allocative (and other individual components of the trade-off) cost of reproduction and towards a populational negative relationship between survival and reproduction, we have to consider the intra-population slow fast continuum (some types of individuals survive more and reproduce less (are slower) than other (which are faster)). This continuum is associated with a metric: the generation time. Some individuals will produce more eggs and survive less in a given time-period because this time-period corresponds to a higher ratio of their generation time (Gaillard and Yoccoz, 2003; Gaillard et al., 2005). It seems therefore important to me, to control for generation time and in general to account for the time-step used for each population studied when analysing costs of reproduction. The data used in this manuscript is not just clutch size and survival rates, but clutch size per year (or another time step) and annual (or other) survival rates.

      The reviewer is right that this is interesting. There has been unexplained difference in temperate (seasonal) and tropical reproduction strategies. Most of our data come from seasonal breeders however. Although there is some variation in second brooding and such often these species only produce one brood. We do agree that a wider consideration here is relevant, but we are not trying to explain all of life-history in our paper. It is clearly the case that other factors will operate and the opportunity for trade-offs will vary among species according to their respective life histories. However, our study focuses on the two most fundamental components of fitness – longevity and reproduction – to test a major hypothesis in the field, and we uncover new relationships that contrast with previous influential studies, and cast doubt on previous conclusions. We question the assumed trade-off between reproduction and annual survival. We show quality is important and that the effect we find in experimental studies, is so small that it can only explain between-species patterns but is unlikely to be the selective force that constrains reproduction within-species. We do agree that there is a lot more work that can be done in this area. We hope we contribute to this, by questioning this central trade-off. We will try and incorporate some of these suggestions in the revision where possible.

      Finally, it is important to relate any study of the costs of reproduction in a context of individual heterogeneity (in quality for instance), to the general problem of the detection of effects of individual differences on survival (see, e.g., Fay et al., 2021). Without an understanding of the very particular statistical behaviour of survival, associated to an event that by definition occurs only once per life history trajectory (by contrast to many other traits, even demographic, where the corresponding event (production of eggs for reproduction, for example) can be measured several times for a given individual during its life history trajectory).

      Thank you for raising this point. The reviewer is right that heterogeneity can dampen or augment selection. Note that by estimating the effect of quality here we give an example of how heterogeneity can possibly do exactly this. We thank the reviewer for raising that we should possibly link this to wider effects of heterogeneity and we aim to do so in the revision.

      References:

      Fay, R. et al. (2021) 'Quantifying fixed individual heterogeneity in demographic parameters: Performance of correlated random effects for Bernoulli variables', Methods in Ecology and Evolution, 2021(August), pp. 1-14. doi: 10.1111/2041-210x.13728.

      Gaillard, J.-M. et al. (2005) 'Generation time: a reliable metric to measure life-history variation among mammalian populations.', The American naturalist, 166(1), pp. 119-123; discussion 124-128. doi: 10.1086/430330.

      Gaillard, J.-M. and Yoccoz, N. G. (2003) 'Temporal Variation in Survival of Mammals: a Case of Environmental Canalization?', Ecology, 84(12), pp. 3294-3306. doi: 10.1890/02-0409.

      van Noordwijk, A. J. and de Jong, G. (1986) 'Acquisition and Allocation of Resources: Their Influence on Variation in Life History Tactics', American Naturalist, p. 137. doi: 10.1086/284547.

      Reviewer #3 (Public Review):

      The authors present here a comparative meta-analysis analysis designed to detect evidence for a reproduction/ survival trade-off, central to expectations from life history theory. They present variation in clutch size within species as an observation in conflict with expectations of optimisation of clutch size and suggest that this may be accounted for from weak selection on clutch size. The results of their analyses support this explanation - they found little evidence of a reproduction - survival trade-off across birds. They extrapolated from this result to show in a mathematical model that the fitness consequences of enlarged clutch sizes would only be expected to have a significant effect on fitness in extreme cases, outside of normal species' clutch size ranges. Given the centrality of the reproduction-survival trade-off, the authors suggest that this result should encourage us to take a more cautious approach to applying concepts the trade-off in life history theory and optimisation in behavioural ecology more generally. While many of the findings are interesting, I don't think the argument for a major re-think of life history theory and the role of trade-offs in fitness maximisation is justified.

      The interest of the paper, for me, comes from highlighting the complexities of the link between clutch size and fitness, and the challenges facing biologists who want to detect evidence for life history trade-offs. Their results highlight apparently contradictory results from observational and experimental studies on the reproduction-survival trade-off and show that species with smaller clutch sizes are under stronger selection to limit clutch size.

      Unfortunately, the authors interpret the failure to detect a life history trade-off as evidence that there isn't one. The construction of a mathematical model based on this interpretation serves to give this possible conclusion perhaps more weight than is merited on the basis of the results, of this necessarily quite simple, meta-analysis. There are several potential complicating factors that could explain the lack of detection of a trade-off in these studies, which are mentioned and dismissed as unimportant (lines 248-250) without any helpful, rigorous discussion. I list below just a selection of complexities which perhaps deserve more careful consideration by the authors to help readers understand the implications of their results:

      We would like to thank the reviewer for their thoughtful response and summary of the findings we also agree are central to our study. The reviewer also highlights areas where our manuscript could benefit from a deeper discussion and we will add detail to our discussion in our revised manuscript.

      We would like to highlight that we do not interpret the failure to detect a trade-off as evidence that there isn’t one. First, and importantly, we do find a trade-off but show this is only incurred when individuals lay beyond their optimal level. Secondly, we also state on lines 258-260 that the lack of evidence to support trade-offs being strong enough to drive variation in clutch size does not necessarily mean there are no costs of reproduction.

      The statement that we have constructed a mathematical model based on the interpretation that we have not found a trade-off is, again, factually incorrect. We ran these simulations because the opposite is true – we did find a trade-off. There is a significant effect of clutch size when manipulated on annual parental survival. To appreciate whether this effect alone can explain why reproduction is constrained, we ran the simulations. From these simulations we find that this effect size is too small to explain the constraint so something else must be going on and we do spend a considerable amount of text discussing the possible explanations (L182-194). Note the possibly most parsimonious conclusion here is that costs of reproduction are not there so we also give that explanation some thought (L201-205 and L247-253).

      We are disappointed by the suggestion that we have dismissed complicating factors which could prevent detection of a trade-off, as this was not our intention. We were aiming to highlight that what we have demonstrated to be an apparent trade-off can be explained through other mechanisms, and that the trade-off between clutch size and survival is not as strong in driving within-species variation in clutch size as previously assumed. We will add further discussion to our revised manuscript to make this clear and give readers a better understanding of the complexity of factors associated with life-history theory. Although we do feel we have addressed this (L248-255).

      • Reproductive output is optimised for lifetime reproductive success and so the consequences of being pushed off the optimum for one breeding attempt are not necessarily detectable in survival but in future reproductive success (and, therefore, lifetime reproductive success).

      We agree this is a valid point, which is mentioned in our manuscript in terms of alternative stages where the costs of reproduction might be manifested (L248-250). We would also like to highlight that in our simulations, the change in clutch size (and subsequent survival cost) was assumed for the lifetime of the individual, for this very reason.

      • The analyses include some species that hatch broods simultaneously and some that hatch sequentially (although this information is not explicitly provided (see below)). This is potentially relevant because species which have been favoured by selection to set up a size asymmetry among their broods often don't even try to raise their whole broods but only feed the biggest chicks until they are sated; any added chicks face a high probability of starvation. The first point this observation raises is that the expectation of more chicks= more cost, doesn't hold for all species. The second more general point is that the very existence of the sequential hatching strategy to produce size asymmetry in a brood is very difficult to explain if you reject the notion of a trade-off.

      We agree with the reviewer that the costs of reproduction can be absorbed by the offspring themselves, and may not be equal across offspring (we also highlight this at L249 in the manuscript). However, we disagree that for some species the addition of more chicks does not equate to an increase in cost, though we do accept this might be less for some species. This is, however, difficult to incorporate into a sensible model as the impacts will vary among species and some species do also exhibit catch-up growth. So without a priori knowledge on this we kept our model simple. To test whether the effect on parental survival (often assumed to be a strong cost) can explain the constraint on reproductive effort, and we conclude it does not.

      We would also like to make clear that we are not rejecting the notion of a trade-off. Our study shows evidence that a trade-off between survival and reproductive effort likely does not drive within-species variation in clutch size. We do explicitly say this throughout our manuscript, and also provide suggestions of other areas where a trade-off may exist (L246-250). The point of our study is not whether trade-offs exist or not, it is whether there is a generalisable across-species trend for a trade-off between reproductive effort and survival – the most fundamental trade-off in our field but for which there is a lack of conclusive evidence within species.

      • For your standard, pair-breeding passerine, there is an expectation that costs of raising chicks will increase linearly with clutch size. Each chick requires X feeding visits to reach the required fledge weight. But this is not the case for species which lay precocious chicks which are relatively independent and able to feed themselves straight after hatching - so again the relationship of care and survival is unlikely to be detectable by looking at the effect of clutch size but again, it doesn't mean there isn't a trade-off between breeding and survival.

      Precocial birds still provide a level of parental care, such as protection from predators. Though we agree that the level of parental care in provisioning food (and in some cases in all parental care given) is lower in precocial than altricial birds, this would only make our reported effect size for manipulated birds to be an underestimate. Again, we would like to draw the reviewer’s attention to the fact we did detect a trade-off in manipulated birds and we do not suggest that trade-offs do not exist. The argument the reviewer suggests here does not hold for unmanipulated birds, as we found that birds that naturally lay larger clutch sizes have higher survival.

      • The costs of raising a brood to adulthood for your standard pair-breeding passerine is bound to be extreme, simply by dint of the energy expenditure required. In fact, it was shown that the basal metabolic rate of breeding passerines was at the very edge of what is physiologically possible, the human equivalent being cycling the Tour de France (Nagy et al. 1990). If birds are at the very edge of what is physiologically possible, is it likely that clutch size is under weak selection?

      If birds are at the very edge of what is physiologically possible, then indeed it would necessarily follow that if they increase the resource allocated in one area then expenditure in another area must be reduced. In many studies however, the overall brood mass is increased when chicks are added and cared for in an experimental setting, suggesting that birds are not operating at their limit all the time. Our simulations show that if individuals increase their clutch size, the survival cost of reproduction counterbalances the fitness gained by increasing clutch size and so there is no overall fitness gain to producing more offspring. Therefore, selection on clutch size is constrained to the within-species level. We do not say in our manuscript that clutch size is under weak selection – we only ask why variation in clutch size is maintained if selection always favours high-producing birds.

      • Variation in clutch size is presented by the authors as inconsistent with the assumption that birds are under selection to lay the Lack clutch. Of course, this is absurd and makes me think that I have misunderstood the authors' intended point here. At any rate, the paper would benefit from more clarity about how variable clutch size has to be before it becomes a problem for optimality in the authors' view (lines 84-85; line 246). See Perrins (1965) for an exquisite example of how beautifully great tits optimise clutch size on average, despite laying between 5-12 eggs.

      We woud like to thank the reviewer for highlighting that our manuscript may be misleading in places, however, we are unsure which part of our conclusions the author is referring to here.The question we pose is “why all birds don’t lay at the population optimum?”, and is central to the decades-long field of life-history theory. Why is variation maintained at such a level? As the reviewer outlines it ranges massively with some birds laying half of what other birds lay.

    1. Author Response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Recommendations For The Authors):

      The revision and rebuttal have addressed all concerns raised in the initial review. Upon review of the revised figures, however, it is unclear why Figure 8C shows many significant DEGs in POMC neurons (which according to Figure 8b is the "GABA_24" cluster), whereas Figure 6A shows few to no DEGs in the GABA_24 cluster. Same for Pmch neurons/Glut_25, which seem to be missing from Figure 6A.

      Answer: In order to capture changes in these smaller cell population we performed an additional DEG analysis with modified and less strict parameters (compared to the first main analysis). We mention the different parameters in the methods part of the revised manuscript (Differential gene expression analysis and case-control based expression shifts (Cacoa)).


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major issues

      1) A key conclusion of this study is that neurons show longer lasting infection-related changes in gene expression than do non-neuronal cells, suggesting that neurons are more persistently affected, which could potentially underlie persistent effects of infection on behavior or physiology. However, the authors also report that over twice as many transcripts were captured in neurons than in non-neuronal cells, and that neurons and non-neurons were not equal in number. The number of transcripts and cells per cell type can affect the likelihood of detecting a differentially expressed gene when comparing cell types. Thus, the difference in infection related DEGs between non-neuronal cells and neurons may be due in part to differences in the numbers of transcripts and cells in each group. How would the number of infection related DEG's compare if the same number of transcripts were detected in neurons as in non-neuronal cells? In addition, is there any relationship between the number of infection related DEGs detected and the number of cells in the respective groups?

      We performed an additional analysis, down sampling the transcripts per cells to similar numbers (~1600 transcripts/cell), showing a similar pattern as shown in the original calculation of DEGs. High downregulation of genes in GABAergic, Glutamatergic and Nonneuronal cells at 3 and 7 dpi, but long-lasting dis-regulation at 23 dpi only in the neuronal subtypes. The analysis results can be found in Supplementary figure12 and on page 11 in the results section.

      2) The rationale for focusing on the LH and DMH is unclear. While these regions do play important roles in control of body weight and wakefulness, the authors do not report whether the cell types relevant to these functions are among those affected by infection. For instance, the authors mention HCRT and MCH neurons in the introduction but do not comment on whether these neurons show any significant changes after H1N1 infection in their analysis. Also, what about the POMC neurons or the Lepr+ DMH neurons? Knowing whether and how these body weight associated cell types are affected could help to connect the phenotypic (e.g., body weight) and molecular changes observed.

      We have added an additional analysis of some well know hypothalamic subtypes. What is interesting is that the different neuronal subtypes respond to the infection differently. While most neurons show the strongest response at 3dpi, POMC+ neurons show consistent changes across all three time points. This could point to different neuronal subtypes paying different roles in the sickness response to the influenza infection. The new data has been added to Figure 8 together with new text in the result section and discussion (Page 17 & 20).

      3) For discriminating neurons and non-neuronal cells based on their expression of neuronal marker genes, was this performed at the single-cell level or the cluster level? Similarly, was the discrimination of GABAergic and glutamatergic neurons done at the cell or cluster level?

      The discrimination of the cell types was done on single cell level. This information has been added to the revised manuscript on page 25.

      4) The authors mention that body weight did not change in some of the mice. Was there any difference in infection related DEGs between the mice that lost weight and the mice that didn't? Was there any correlation between the molecular and phenotypic (i.e., body weight) changes observed?

      We agree that this could have been an interesting point to investigate, however, we can only say with certainty for 2 animals in the recovery group (23.7 and 23.8) that they didn’t lose weight (Supplement figure 2). In Figure 4A we show that overall the different time points group well together, with exception for animal 23.7 which seems to have a better overlap with 7 dpi, indicating that we possibly captured here a delayed disease response. However, to make any indepth analysis, we have to few animals without weight-loss.

      5)The authors noted that the hypothalamic neurons continue to show infection-related changes in gene expression at 23dpi though body weight has returned to normal. In this H1N1 model, are there any persistent behavioral deficits at 23dpi that could be explained by the persistent changes in gene expression in DMH and LH neurons?

      We did not test for long-lasting behavioural changes in these animals. Another study by Hosseini et al. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6596076/) focus on cognitive long term effects of viral infections. Even though they did not include the here used H1N1 model, they included the PR8 strain, but didn’t report any long lasting behavioural or cognitive changes. So far only cognitive deficits during the acute phase of the infection caused by the PR8 H1N1 model have been shown. This would be a very interesting follow up study to perform, but this, we believe, is out of scope for the current manuscript.

      6) In Figure 1F, the 3dpi sample appears to differ from the other samples in terms of its neuron/non-neuron composition. The authors point this out but offer no discussion or further analysis. Was this difference driven by one or more cell types? Is this difference likely to be technical (e.g., less white matter in sample = fewer oligodendrocytes), or could this be related to the infection (e.g., glial death or neurogenesis at 3dpi)?

      We have added the location of the punching within the hypothalamus for the different groups to the supplements (Supplementary Figure 3). The differences in neuron/non-neuron composition could originate from differences in the punching location, but we do not have data to support this conclusion. The difference could also stem from biological alterations during the infection.

      7) Since influenza viruses replicate in the cell nuclei, did the authors capture any H1N1 RNA in their single-nuclei RNA-seq samples?

      We mapped the single nuclei data against the viral genes, but could not detect any of the viral genes in the data set. We are still optimizing detecting of low amounts of viral genes in snRNA-seq data and have not included this information in the manuscript. We believe, that the virus did not manage to migrate in the hypothalamus and infiltrate the cells in the here captured area.

      Minor Issues

      1) Page 1. The abstract ends with the sentence: This is complemented by increased activity of microglia monitoring their surroundings. Presumably, the authors are basing this statement on the functions of genes altered in microglia by infection. However, saying that microglia behavior has changed is a bit of a stretch here, since the results suggest a change in the molecular phenotype of microglia but do not demonstrate a change in their behavior.

      We agree that the phrasing of the end of the abstract was not accurate and didn’t reflect the outcome of the analysis. We adjusted the sentence to: “The change of microglia gene activity suggest that this is complemented by a shift in microglia activity to provide increased surveillance of their surroundings.” Which should provide a better idea that the findings we present are a suggestion based on the transcriptomic changes in the cell population. (Page 1)

      2) Page 8. The authors refer to Th+, Ddc+ neurons as dopaminergic. However, adrenergic/noradrenergic neurons also express these genes. How do the authors know the neurons are not adrenergic/noradrenergic?

      There are to our knowledge no nor-adrenaline/adrenaline producing neurons in the hypothalamus. In contrast dopaminergic neurons have indeed been identified in this area.

      3) In the Methods section, Slc17a6 and Slc32a1 are not "pan-neuronal markers" since they are only expressed by subsets of neurons.

      We removed the glutamatergic and GABAergic marker genes (Slc17a6 and Slc32a1) from the list of neuronal markers. They are stated further down in the method section as glutamatergic and GABAergic markers. Find the changes on Page 24/25)

      4) Was the hashtagging antibody custom or commercial? If commercial, what was the source, catalog #, lot #? If custom, the authors should describe how it was made and validated.

      We used commercial antibodies for hash-tagging. We added the missing information to the manuscript and can be found on Page 24 of the revised manuscript.

      5) In the data processing section of the Methods, SCTransform is mentioned twice. Was normalization with SCTransform applied twice?

      The data was only normalized once using the SCTransfrom method. We adjusted the part of the method section to make it more clear (Page 24).

      6) In the section on gene set enrichment analysis, the first sentence includes this text: "(is a reference needed?)." The answer is yes - Alexa A, Rahnenfuhrer J (2022). topGO: Enrichment Analysis for Gene Ontology. R package version 2.50.0.

      The missing reference was added (Page 26).

      7) Page 4: "leaved" should be corrected to "left"

      The wrong wording was corrected.

      8) Figure 2D - gene is labeled as Slc31a1 on the figure and Slc32a1 in the figure legend

      We provided a new Figure plate with the right marker genes.

      9) Official gene IDs should be italicized

      We checked the gene IDs again, and italicized wrongly formatted gene IDs.

      10) It is not clear whether the authors are planning to share their code. However, their code would be needed to reproduce their results, since the methods section provides a summary of what was done but lacks key details (e.g., parameters and software packages used during data processing and analysis)

      Code will be shared on request. We added this also to the revised manuscript (Page 26)

      Reviewer #2 (Public Review):

      The new work from Lemcke et al suggests that the infection with Influenza A virus causes such flu symptoms as sleepiness and loss of appetite through the direct action on the responsible brain region, the hypothalamus. To test this idea, the authors performed single-nucleus RNA sequencing of the mouse hypothalamus in controlled experimental conditions (0, 3, 7, and 23 days after intranasal infection) and analyzed changes in the gene expression in the specific cell populations. The key results are promising.

      However, the analysis (cell type annotation, integration, group comparison) is not optimal and incomplete and, therefore should be significantly improved.

      More specifically:

      1) The current annotation of cell types (especially neuronal but also applicable to the group of heterogeneous "Unassigned cells") did not make a good link to existing cell heterogeneity in the hypothalamus identified with scRNA seq in about 20 recently published works. All information about different peptidergic groups can not be extracted from the current version (except for a few). There are also some mistakes or wrong interpretations (eg, authors assigned hypothalamic dopamine cells to the glutamatergic group, which is not true). This state is feasible to improve (and should be improved) with already existing data.

      We repeated the cell label transfer with the newly published HypoMap and added additional information to the supplements about the cell type assignments. Additionally, we agree that the dopaminergic neurons do not belong to the group of glutamatergic neurons, however assigned them into this group based on the clustering. We changed the phrasing in the results, to make a better differentiation between the two groups (Page 8).

      2) I am confused with the results shown in the label transfer (suppl fig 3 and 4; note, they do not have the references in the text) applied to some published datasets (authors used the Seurat functions 'FindTransferAnchors' and 'TransferData'). The final results don't make sense: while the dataset for the arcuate nucleus (Campbel et al) well covered the GABAergic neurons it is not the case for the whole hypothalamus datasets (Chen et al; Zeisel et al). Similarly, for glutamatergic neurons. Additionally, I could not see that the label transfer works well for PMCH cells which should be present in the dataset for the lateral hypothalamus (Mickelsen et al,2019).

      We performed the additional label transfer of the hypothalamus data. Here we accepted a prediction score of 0.5 and transferred a cell type label to our annotated cluster IDs, if at least 10% of cells within a cluster were annotated with the 0.5 prediction score. We found that well defined neuron population types like Hcrt+, Pmch+ and Hdc+ neurons as well as Pomc+ neurons were tagged with a high predictions scores ( >= 0.9, Supplement Figures 6 and 7) and non-neuronal cell types (Supplement Figure 8) were well annotated. Additionally we identified an Agrp+ neuron population with the Gaba_1 neurons. This information has been added to the revised manuscript (Pages 6, 8).

      3) There are newly developed approaches to check the shifts in the cell compositions and specific differential gene expression in the cell groups (e.g. Cacoa from Kharchenko lab, scCoda from Büttner et al; etc). Therefore, I did not fully understand why here the authors used the pseudo-bulk approaches for the data analysis (having such a valuable dataset with multiple hashed samples for each timepoint). Therefore it would be great to use at least one of those approaches, which were developed specifically for the scRNAseq data analysis. Or, if there are some reasons - the authors should argue why their approach is optimal

      We performed an additional analysis comparing case-control studies (Cacoa). We perfomed both modalities, cluster-based and cluster-free expression shifts and cell type compositions We could partly confirm our findings using the pseudo-bulk approach. The clusterspecific density shift (Supplement Figure 15) identified only shifts in non-neuronal cell types between the Control group and 3 dpi. We believe, these composition shifts are caused by the lower number of non-neuronal cells in the 3 dpi time point. Cluster-specific expression shifts show similar results as in the pseudo-bulk approach, with significant expression shift identified at 3 and 7 dpi in neuronal and non-neuronal cell clusters (Supplement Figure 16). However, no significant expression shifts were identified in the recovery group at 23 dpi. Using the cluster-free expression shift approach, however we were able to identify a similar picture as described with the pseudo-bulk approach. In the recovery group at 23 dpi, we found mainly changed gene programs in neuronal cells, and no transcriptional changes in the non-neuronal cells (Supplement Figure 17-20). This new analysis has been added to the revised manuscript (Pages 4-6, 26) including supplementary figure and tables as stated.

      4) When the authors describe the DGE changes upon experimental conditions (Figures 5 and 6), my first comment is again relevant: it is difficult to use the current annotation and cell type description as the reference for testing virus effects and shifts in the DGE in distinct neuronal subtypes.

      The cell type annotations have been checked and additional label transfer has been performed. All figures in the manuscript has been updated.

      I have to note that the experimental design is well done and logical. Therefore I believe that to strengthen the conclusions, the already obtained datasets can be used for improved analysis.

      Reviewer #2 (Recommendations For The Authors):

      I have some minor concerns:

      1) For the quality check it would be good to see how different hashed samples for each timepoint cover the UMAP embeddings.

      We added the UMAP embeddings to the supplement. (Supplement Figure 4)

      2) In Fig 1e colors are not optimal - it is impossible to assess it.

      We separated the UMAPs for the different time points to make it easier to assess. See updated Figure 1E.

      3) In the methods authors started "Single-nucleus RNA-sequencing cell population identification" from the description of using a Gaussian mixture model (GMM). However, I could not clearly understand how this model was used and which kind of result it provided.

      We used an GMM model with known markers for neurons and in a second step for glutamatergic and GABAergic cells to sub-cluster the cells and then selected based on high and low expression of the marker genes in the cluster into their respective classes. This information has been added to the method section (Page 24/25).

      4) Could the authors better clarify why "they calculated normalization factors using the scran function 'computeSumFactors'" when working with pseudobulk analysis?

      This size factor normalization was recommended for single cell data by the authors of the DESeq2 packages.<br /> http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

      5) I didn't find logic in "a cell cluster was only included if it contained more than 2 nuclei in at least 3 individual animals" (page 24). Maybe I misinterpreted it.

      The rationale for the selection methods was based on the findings that not all animals in the recovery group had the same effects in weight loss. The acute time points didn’t show enough weight loss to decide if all animals in these groups lost the same amount and were equally sick. Hence, in order to have biological robustness we decided to only analyse clusters where cells from at least 3 animals at a specific time point contributed to a cell type. In order to have enough cells per cell type for the calculation of DEGs, we decided to only include a cell type at a specific time point if it contained at least 3 cells from one individual. This selection method limits the analysis to cell types with at least 9 cells per time point.

    1. Author Response

      On behalf of the authors of the article "Elevated glycolytic metabolism of monocytes limits the generation of HIF-1α-driven migratory dendritic cells in tuberculosis", I would like to provide interim responses noting some relevant points about eLife assessment and public reviews,

      eLife assessment

      This useful study tests the hypothesis that Mycobacterium tuberculosis infection increases glycolysis in monocytes, which alters their capacity to migrate to lymph nodes as monocyte-derived dendritic cells. The authors conclude that infected monocytes are metabolically pre-conditioned to differentiate, with reduced expression of Hif1a and a glycolytically exhaustive phenotype, resulting in low migratory and immunologic potential. Unfortunately, the evidence for the conclusions is currently incomplete, as the use of dead mycobacteria will affect bioenergetic readouts. The study will be of interest to microbiologists and infectious disease scientists.

      We would like to clarify what may be a misunderstanding. Indeed, the study did not deal with “infected monocytes” per se, but rather with the ability of monocytes purified from TB patients vs. healthy control to differentiate into DCs with different migratory capacities upon Mtb infection or stimulation. Since there is no evidence for the presence of Mtb in the patient’s blood, the metabolic effects we observed are likely a consequence of systemic pulmonary disease rather than of direct interaction of monocytes with Mtb. Although irradiated Mtb was used in most experiments, in particular because Seahorse and other technologies cannot be used in our BSL3 laboratory, we provide evidence (Figure 1) that infecting DCs with live Mtb or stimulating DCs with irradiated Mtb generates comparable glycolytic profiles (release of lactate, glucose consumption, HIF1a expression and LDHA expression). To strengthen the relevance of our data, we will characterize the metabolism of DCs infected with live Mtb using SCENITH.

      Reviewer #1 (Public Review):

      The manuscript by Maio and colleagues looks at the impact of the heightened glycolytic activity induced by Mtb in monocytes, and its impact on Hif1-a dependent migration of DCs.

      Data concerning the biological significance of the impact of enhanced glycolysis on DC migration is strong and convincing. While Hif1-a is obviously a key factor, the evidence that it is a linear component in the cascade falls a little short as the main inhibitor used PX-478 does not have a clear, single mode of action. Additional characterization with the alternative inhibitor (Echinomycin) would make the argument more convincing. 

      We would like to thank the reviewer for their positive assessment of our manuscript. Although Echinomycin has been used for validating some of the representative experiments performed in our study (see supplementary figure 2E-F), we agree with the reviewer’s suggestion. Therefore, additional experiments using echinomycin will be carried out to confirm our results.

      Reviewer #2 (Public Review):

      The manuscript by Maio et al attempts to examine the bioenergetic mechanisms involved in the delayed migration of DC's during Mtb infection. The authors performed a series of in vitro infection experiments including bioenergetic experiments using the Agilent Seahorse XF, and glucose uptake and lactate production experiments. This is a well-written manuscript and addresses an important question in the TB field. A major weakness is the use of dead Mtb in virtually all the experiments. Unfortunately, the authors did not attempt to address this critical confounding factor. As a result, data was interpreted, and conclusions were made as if live Mtb was used. Also, previous studies (PMID: 30444490 and PMID: 31914380) have shown that live Mtb suppresses glycolysis, which contradicts findings in this study, perhaps because dead Mtb was used here. For these reasons, obtaining any pertinent conclusions from the study is not possible, which diminishes the significance of the work.

      We thank the reviewer for their evaluation of our study. We agree that using live Mtb in all experiments would have been ideal. However, we do not have a Seahorse Analyzer in our BSL3 facility. Thus, we will characterize the metabolism of DCs infected with live Mtb using SCENITH during revision of our manuscript.

      With regard to the differences between our results and those of previous studies showing Mtb-induced suppression of glycolysis, they could be explained by the use of different Mtb strains, different multiplicity of infection (MOI), macrophages of different origins, and different measurement timepoints, as discussed in one of these publications (PMID 30444490). For instance, in PMID 30444490, hMDMs infected at an MOI of 1 showed increased extracellular acidification and glycolytic parameters, as opposed to higher MOI or the same MOI but measured in THP1 cells. Importantly, the aforementioned articles studied macrophage and not DC metabolism. These aspects will be discussed in a revised manuscript.

    1. Author Response

      We thank the reviewers for their thoughtful suggestions, which we will address in the revised manuscript.

      Briefly, we purposely fixed the Hill coefficients to h=1 on the grounds that one drug molecule binding to the channel is sufficient to block the channel and there is no strong evidence for co-operative binding in the literature. Doing so also helped to constrain the degrees of freedom in the face of noisy observations in the public datasets. As noted by Reviewer 2, the quality of the drug measurements varies widely across laboratories and this is particularly noticeable in estimates of Hill coefficients which are therefore less reliable.

      The dose-dependent curves of multi-channel block (Figure 6) are plotted for all four dimensions in the Supplementary Dataset. We omitted GKs and GNaL from Figure 6 in an attempt at brevity since they do not add much to the story.

      It is true that pacing frequency was not considered in this study.

      The drugs were assessed across a range of doses (1x to 30x) but dosage only had a minimal impact on accuracy (88.1% to 90.8%) as shown in Figure 8A.

      Finally, we emphasize that the metric’s novelty lies in deriving a simple linear model from biophysical principles of ion-channel blockade rather than blind statistical model fitting.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Both reviewers strongly suggest that you modify the title of your paper to something that better reflects the data presented.

      We have made the title more specific to the findings described in the manuscript and revised the rest of the manuscript in response to the additional reviewer’s comments. We adjusted the abstract accordingly.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript conducts a classic QTL analysis to identify the molecular basis of natural variation in disease resistance. This identifies a pair of glycosyltransferases that contribute to steroidal glycoalkaloid production. Specifically altering the final hexose structure of the compound. This is somewhat similar to the work in tomatine showing that the specific hexose structure mediates the final potential bioactivity. Using the resulting transgenic complementation lines that show that the gene leads to a strong resistance phenotype to one isolate of Alternaria solani and the Colorado potato beetle. This is solid work showing the identification of a new gene and compound influencing plant biotic interactions. While the experiments are solid, the introduction, discussion and associated claims don't accurately reflect my reading of what is known and said in the current literature.

      The sentence on line 53-54 is misleading. It provides only three citations on specific links between specialized metabolism and disease resistance. However, there are actually at least 40 on specific links of camalexin and indolic phytoalexins to disease resistance. Similarly there are dozens of uncited papers on benzoxazinoids, indolic glucosinolates, aliphatic glucosinolates and tomatine to both non-host and host based resistance mechanisms. This even goes as far as showing how the pathogens resist an array of these compounds. The choices in the introduction make it appear that little is known about specialized metabolism to disease resistance but I would suggest that this is not an allusion supported by the literature. I would agree that given the breadth of specialized metabolism we have a lot of knowledge about a set of them but that there are hundreds to thousands of untested compounds but to indicate that little is known is unfair to the specialized metabolism community. This is especially true as the introduction and discussion give no image of the large body of literature on specialized metabolism to insect interactions even though this is a major component of this manuscript.

      We have rewritten this part of the introduction (lines 50-69). In the original text, we meant to convey our impression that receptor-mediated resistance is studied in a very high degree of detail, and that resistance that is based on secondary metabolites is receiving less recognition in comparison, especially in the plant-microbe interactions field. We agree that our comments might give the (false) impression that there is not much known. There is indeed a lot of data to support the importance of specialised metabolites in resistance, especially against necrotrophic pathogens and insects. The changes that we made should give a better reflection of that knowledge.

      I would also agree that specialized metabolism is not a conscious target of breeding programs but the work on benzoxazinoids in maize and glucosinolates in the Brassica's has shown that these compounds have been influenced by breeding programs. Similarly work on de novo domestication of multiple crops is focused on the adjustment of specialized metabolism in these crops.

      The reviewer is right to point out that specialized metabolism is influenced by breeding. Specialized metabolites may not only be involved in defence, but they can also affect other properties of the plant such as quality aspects. Potato breeders have made efforts to reduce SGA content in tubers to prevent problems with toxicity and to meet safety regulations. We have adjusted the discussion (lines 255-260).

      I would disagree with the hint on line 49-50 and again on lines 236-239 that specialized metabolism may have less pleiotropy. This is not supported by recent work on benzoxazinoids and glucosinolates showing that they have numerous regulatory links to the plant and can be highly pleiotropic. Even the earliest avenicin work in oat showed that the deficient lines had altered root development.

      We agree with the reviewer and we have removed the hints that specialized metabolism may have less pleiotropy from the manuscript. We do believe that the broad-spectrum activity of specialized metabolites can be an advantage, but this non-specificity also comes with risks in case of food crops. We note the potential negative effects of SGAs in the discussion (see previous comment and lines 300-303).

      My main message from the above three paragraphs is to point out that there are a number of places in the manuscript where the current state of the specialized metabolite literature is not accurately portrayed. To properly place the manuscript in the broader context, I would suggest a more even handed introduction and discussion that takes into account the current state of the specialized metabolism literature.

      We rewrote these parts to provide a more balanced view on the role of specialized metabolites in disease resistance.

      Is it accurate to say complete resistance to A. solani if only a single isolate of the pathogen is used? Is there evidence that I am unaware of that there are no isolates of this pathogen with saponin resistance? There are pathogens with natural tomatine resistance and this is a common feature of plant pathogens that they have genetic variation in the resistance to specialized metabolism. For example, it should be noted that Botrytis BO5.10 is a tomatine sensitive isolate and the van Kan and Hahn groups have published on isolates that are resistant to saponins. I would suggest caveating across the manuscript that this is a single isolate and that it is possible that there may be isolates with natural resistance to the steroidal glycoalkaloid?

      While it is true that we only describe the results of testing a single isolate of A. solani in the submitted manuscript, we previously showed that the S. commersonii resistance is effective against additional Alternaria isolates and species from different locations (1). We included this context to the introduction (lines 71-73) and also added the results of testing a more recent Dutch A. solani isolate (altNL21002, isolated from a potato field in the Netherlands in 2021) and an isolate from the US (ConR1H, isolated from a potato field in Idaho in 2015) to the supplementary material of the revised manuscript (lines 102-104). Of course, this still does not prove that the SGAs protect against all A. solani isolates and we have been more specific in referring to the Alternaria isolate that was tested. Similarly, it is impossible to make a general statement on the lack of detoxification capacity of all isolates of A. solani. It may indeed be possible that there are Alternaria isolates that are tolerant to the tetraose SGAs produced by S. commersonii, especially in natural habitats where Solanum species that produce tetraose SGAs and Alternaria co-occur. We have added this point to the discussion (lines 292-294).

      In Figure 4b, is the infection site about 3.5 mm in size such that 3.5 mm means absolutely no infection? If not, that would mean there is some outgrowth by Alternaria and the resistance isn’t complete.

      We often observe dead tissue underneath the inoculation droplet on resistant plants, which is measured as a lesion. Such lesions can usually visually be discriminated from the lesions on susceptible genotypes by their colour (dark black for resistant plants versus a more brownish colour of the lesions on susceptible plants), but this information is lost in the quantitative data presented in the figures. Droplets occasionally flow out over the leaf surface, which may explain why larger ‘lesions’ are sometimes observed on resistant plants. In rare cases, there may also be a little bit of outgrowth of Alternaria beyond the inoculation droplet before the infection is stopped on resistant genotypes. Whether the resistance is ‘complete’ in such cases is debatable. We tuned down our statements regarding ‘complete’ resistance throughout the manuscript.

      Reviewer #2 (Public Review):

      The study focuses on a mechanism of pest/pathogen resistance identified in Solanum commersonii, which appears to offer dominant resistance to Alternaria solani through the activity of specific glycosyltransferases which facilitate the production of tetraose glycoalkaloids in leaf tissue. The authors demonstrated that these glycoalkaloids are suppressive to the growth of multiple pathogenic ascomycetes and furthermore, that transgenic plants expressing these glycosyltransferases in susceptible S. commersonii clones demonstrate improved resistance to a specific strain of A. solani and a genotype of Colorado Potato Beetle. The study design is straightforward, yet thorough, and does a good job demonstrating the importance of these genes in resistance. While the research findings are significant there are statements throughout the manuscript that overstate both the novelty and utility of the findings.

      Title: While the protection is impressive, the title suggests that these glycoalkaloids provide protection against all fungi and insects, which is both unlikely and essentially impossible to prove. This should be changed to something more measured. This is especially true given that only a single fungus and insect were tested against transgenic plants, but would be an overstatement even with more robust evaluation.

      We appreciate the comment of the reviewer and agree that is unlikely that the S. commersonii SGAs protect against all fungi and insects and that it would be impossible to prove this. We intended to highlight the fact that these compounds provide a qualitative (‘complete’) resistance against the tested isolates/genotypes, and that they are effective across a wide range of organisms (‘fungi and insects’). We have made the title more specific to the findings described in the manuscript.

      Throughout the paper: A single isolate of A. solani and a single genotype of CPB were used in this study. While this is in line with the typical limitations of such a study, the authors need to be careful about claiming broad resistance to either of the species. Variability in fungicide tolerance and detoxification activity have been noted in both fungi and CPB, so more specific language should be used throughout (such as L213 and L221).

      Similar points were raised by reviewer 1. We have tuned down our statements regarding ‘complete’ resistance and clarified that we tested only a limited set of A. solani isolates and single CPB genotype throughout the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      L39: Fix grammar.

      Done

      L42: Race is a terminology not used in all pathosystems (others include pathovar, subspecies, etc.).

      We removed the word race and use the general ‘pathogen’.

      L53: The role of pterocarpans, flavonoids, indoles, terpenes, and a number of other compound classes have been linked to plant defense across the entire plant kingdom. Highlighting Avenacin is fine, but it shouldn't be ignored that the role of phytoalexins and phytoanticipins in defense against fungi (and the subsequent detoxification of these compounds by fungi) has been well established in a number of pathosystems.

      We have removed the specific reference to avenacin (we still refer to it in the discussion, as there are interesting similarities with the saponins from tomato and potato) and tried to highlight the diversity of plant defence compounds across the plant kingdom and the importance of tolerance mechanisms in different pathosystems in the revised manuscript (lines 52-60).

      L234-237: This is broadly an overstatement. To my knowledge there is quite a bit of interest in plant defense compounds for breeding (in plants generally) and we know quite a bit about their mode of action (fungal membrane perturbation through binding to ergosterol). There have been active breeding efforts for decades to reduce glycoalkaloid content in potatoes due to the hemolytic activity of these compounds. While this may or may not be the case with these specific SGAs, a more accurate summary of the state of the field is warranted.

      We have rewritten the paragraph to give a more balanced view of breeding for SGAs in potato (lines 63-69 on the mode of action of SGAs and lines 255-260 regarding breeding for specific SGA variants in potato).

      L279: "...introgression breeding could help to move these compounds from wild relatives to crop species..." Yes, but at what cost? If it results in increase GSAs in tubers, then the plants would be inedible. This could be made more clear and support the following statement that alternative deployment techniques including application as biological protectants.

      The reviewer is right to point out the importance of considering negative effects of SGAs in breeding. We paid more attention to this aspect in the discussion and added a sentence to clarify that effects on human health and the environment should be considered before employing these compounds (lines 300-303).

      Discussion:

      L229-230: the authors state that the tetraose SGA from commersonii can protect against other fungi, but this does not appear to have been tested. Rather, they looked at resistance in the CGN18024_1 and CGN18024_3 lines, which could express other factors unrelated to GSAs to impact resistance or susceptibility. Experiments to support this statement would include screening of the transgenic lines for resistance to other fungi, but this does not appear to have been done.

      We believe that the tetraose SGAs have the potential to protect against a range of fungi, but the reviewer correctly points out that these experiments do not provide definitive proof for their role in resistance to other pathogens besides A. solani and CPB. We have adjusted our statement accordingly (lines 247-250 of the discussion, 84-88 of the introduction and the abstract).

      Future questions should likely include characterizing the overall SGA content of resistant potatoes, characterizing the saponin content specifically found within tubers, and purifying the compounds to characterize the hemolytic activity of these specific compounds. Even if these aren't your exact plans, they would be necessary steps in any resistance breeding efforts. In particular, it will be important to know if the SGA content is increased in tubers of the tested lines, especially CGN18024_1, CGN18024_3, and the transgenics. Ideally, for breeding purposes there would be a disconnect between SGA production in foliage and tubers. It is unclear whether this is possible in these lines.

      These are all good questions, and it would be nice to follow up on them in future research. We explore the different routes towards a safe use of SGAs in resistance breeding in the discussion.

      It has been shown that commersonine, one of the tetraose glycoalkaloids is also present in Solanum chacoense. It would be useful to note both this fact and that the Early Blight resistance which has been noted in Solanum chacoense may additionally be from these compounds (examples below).

      o https://www.cabi.org/GARA/FullTextPDF/Pre2000/19871336643.pdf

      o https://apsjournals.apsnet.org/doi/pdf/10.1094/PHYTO-06-18-0181-R (breeding line 24-24-12 has s. chacoense parentage)

      o https://agris.fao.org/agris-search/search.do?recordID=DJ20220231195

      This is indeed an interesting observation and it is well possible that SGAs are responsible for the resistance of S. chacoense. There are additional wild Solanum species that produce similar SGAs as found in S. commersonii that could confer resistance to early blight (or CPB) and we added this to the discussion (lines 263-265).

      Reference

      1. Wolters PJ, de Vos L, Bijsterbosch G, Woudenberg JH, Visser RG, van der Linden G, et al. A rapid method to screen wild Solanum for resistance to early blight. European Journal of Plant Pathology. 2019;154:109-14.
    1. Author Response

      We thank the reviewers for their work and their thoughtfulness. However, it seems to us that much (but not all) of the critique reflects a misunderstanding of the goals and methods of computational modeling. Details are below. We are grateful for the opportunity to include our views about this in the context of our replies to the Public Critiques of our paper. The comments of the reviewers were very helpful in allowing us to see what might not be clear to our readers.

      eLife assessment

      This useful modeling study explores how the biophysical properties of interneuron subtypes in the basolateral amygdala enable them to produce nested oscillations whose interactions facilitate functions such as spike-timing-dependent plasticity. The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered. This work will be of interest to investigators studying circuit mechanisms of fear conditioning as well as rhythms in the basolateral amygdala.

      Most of our comments below are intended to rebut the sentence: “The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered”. Details are below in the answer to reviewers.

      We believe this work will be interesting to investigators interested in dynamics associated with plasticity, which goes beyond fear learning. It will also be of interest because of its emphasis on the interactions of multiple kinds of interneurons that produce dynamics used in plasticity, in the cortex (which has similar interneurons) as well as BLA.

      We note that the model has sufficiently detailed physiology to make many predictions that can be tested experimentally. In the revision, we will be more explicit about this.

      We thank Reviewer #1 for stressing our work's important contribution to providing concrete hypotheses that can be tested in vivo and highlighting the importance of examining in the future the synergistic role of the interneurons in the BLA in fear learning in the BLA. The weaknesses reported by the Reviewer concern deviations of the model compared to the experimental literature. We describe below why we think those differences are minor in the context of the aims of our model. Specifically,

      1) Some connections among neurons in the BLA reported by (Krabbe et al., 2019) have not been taken into account in the model. Some connections between cell types were excluded without adequate justification (e.g. SOM+ to PV+).

      In order to constrain our model, we focused on what is reported in (Krabbe et al., 2019) in terms of functional connectivity instead of structural connectivity. Thus, we included only those connections for which there was strong functional connectivity. For example, the SOM+ to PV+ connection is shown to be small (Supp. Fig. 4, panel t). We also omitted PV+ to SOM+, PV+ to VIP+, SOM+ to VIP+, VIP+ to excitatory projection neurons; all of these are shown in (Krabbe et al. 2019, Fig. 3 (panel l), and Supp. Fig. 4 (panels m,t)) to have weak functional connectivity, at least in the context of fear conditioning. See below for comments on modeling strategies. We will explain this better in our revision.

      2) The construction of the afferent drive to the network does not reflect the stimulus presentations that are given in fear conditioning tasks. For instance, the authors only used a single training trial, the conditioning stimulus was tonic instead of pulsed, the unconditioned stimulus duration was artificially extended in time, and its delivery overlapped with the neutral stimulus, instead of following its offset. These deviations undercut the applicability of their findings.

      Regarding the use of a single long presentation of US rather than multiple presentations (i.e., multiple trials): in early versions of this paper, we did indeed use multiple presentations. We were told by experimental colleagues that the learning could be achieved in a single trial. We note that, if there are multiple presentations in our modeling, nothing changes; once the association between CS and US is learned, the conductance of the synapse is stable. Also, our model does not need a long period of US if there are multiple presentations. This point will be made clearer in our revision.

      We agree that, in order to implement the fear conditioning paradigm in our in-silico network, we made several assumptions about the nature of the CS and US inputs affecting the neurons in the BLA and the duration of these inputs. A Poisson spike train to the BLA is a signal that contains no structure that could influence the timing of the BLA output; hence, we used this as our CS input signal. We also note that the CS input can be of many forms in general fear conditioning (e.g., tone, light, odor), and we wished to de-emphasize the specific nature of the CS. The reference mentioned in the Recommendations for authors, (Quirk, Armony, and LeDoux 1997), uses pulses 2 seconds long. At the end of fear conditioning, the response to those pulses is brief. However, in the early stages of conditioning, the response goes on for as long as the figure shows. The authors do show the number of cells responding decreases from early to late training, which perhaps reflects increasing specificity over training. This feature is not currently in our model, but we look forward to thinking about how it might be incorporated. Regarding the CS pulsed protocol used in (Krabbe et al., 2019), it has been shown that intense inputs (6kHz and 12 kHz inputs) can lead to metabotropic effects that last much longer than the actual input (200 ms duration) (Whittington et al., Nature, 1995). Thus, the effective input to the BLA may indeed be more like Poisson.

      Our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning. Despite paradigms involving both overlapping (delay conditioning, where US coterminates with CS (Lindquist et al., 2004), or immediately follows CS (e.g., Krabbe et al., 2019)) and non-overlapping (trace conditioning) CS/US inputs existing in the literature, we hypothesized that concomitant activity in CS- and US-encoding neuron activity should be crucial in both cases. This may be mediated by the memory effect, as suggested in the Discussion of our paper, or by metabotropic effects as suggested above, or by the contribution from other brain regions. We will emphasize in our revision that the overlap in time, however instantiated, is a hypothesis of our model. It is hard to see how plasticity can occur without some memory trace of US. This is a consequence of our larger hypothesis that fear learning uses spike-timing-dependent plasticity; such a hypothesis about plasticity is common in the modeling literature. We will discuss these points in more detail in our revision.

      We thank Reviewer #2 for their comments. Below, we reply to each of them:

      1) Gamma oscillations are generated locally; thus, it is appropriate to model in any cortical structure. However, the generation of theta rhythms is based on the interplay of many brain areas therefore local circuits may not be sufficient to model these oscillations. Moreover, to generate the classical theta, a laminal structure arrangement is needed (where neurons form layers like in the hippocampus and cortex)(Buzsaki, 2002), which is clearly not present in the BLA. To date, I am not aware of any study which has demonstrated that theta is generated in the BLA. All studies that recorded theta in the BLA performed the recordings referenced to a ground electrode far away from the BLA, an approach that can easily pick up volume conducted theta rhythm generated e.g., in the hippocampus or other layered cortical structure. To clarify whether theta rhythm can be generated locally, one should have conducted recordings referenced to a local channel (see Lalla et al., 2017 eNeuro). In summary, at present, there is no evidence that theta can be generated locally within the BLA. Though, there can be BLA neurons, firing of which shows theta rhythmicity, e.g., driven by hippocampal afferents at theta rhythm, this does not mean that theta rhythm per se can be generated within the BLA as the structure of the BLA does not support generation of rhythmic current dipoles. This questions the rationale of using theta as a proxy for BLA network function which does not necessarily reflect the population activity of local principal neurons in contrast to that seen in the hippocampus.

      In both modeling and experiments, a laminar structure does not seem to be needed to produce a theta rhythm. A recent experimental paper, (Antonoudiou et al. 2021), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. The authors draw this conclusion by looking at mice ex vivo slices. The currents that generate these rhythms are in the BLA, since the hippocampus was removed to eliminate hippocampal volume conduction and other nearby brain structures did not display any oscillatory activity. Also, in the modeling literature, there are multiple examples of the production of theta rhythms in small networks not involving layers; these papers explain the mechanisms producing theta from non-laminated structures (Dudman et al., 2009, Kispersky et al., 2010, Chartove et al. 2020). We are not aware of any model description of the mechanisms of theta that do require layers.

      2) The authors distinguished low and high theta. This may be misleading, as the low theta they refer to is basically a respiratory-driven rhythm typically present during an attentive state (Karalis and Sirota, 2022; Bagur et al., 2021, etc.). Thus, it would be more appropriate to use breathing-driven oscillations instead of low theta. Again, this rhythm is not generated by the BLA circuits, but by volume conducted into this region. Yet, the firing of BLA neurons can still be entrained by this oscillation. I think it is important to emphasize the difference.

      Many rhythms of the nervous system can be generated in multiple parts of the brain by multiple mechanisms. We do not dispute that low theta appears in the context of respiration; however, this does not mean that other rhythms with the same frequencies are driven by respiration. Indeed, in the above answer we showed that theta can appear in the BLA without inputs from other regions. In our paper, the low theta is generated in the BLA by VIP+ neurons. Using intrinsic currents known to exist in VIP+ neurons (Porter et al., 1998), modeling has shown that such neurons can intrinsically produce a low theta rhythm. This is also shown in the current paper. This example is part of a substantial literature showing that there are multiple mechanisms for any given frequency band. We will emphasize these points in our revision; we note that, for any individual case, such as this one, the mechanism needs to be tested experimentally.

      3) The authors implemented three interneuron types in their model, ignoring a large fraction of GABAergic cells present in the BLA (Vereczki et al., 2021). Recently, the microcircuit organization of the BLA has been more thoroughly uncovered, including connectivity details for PV+ interneurons, firing features of neurochemically identified interneurons (instead of mRNA expression-based identification, Sosulina et al., 2010), synaptic properties between distinct interneuron types as well as principal cells and interneurons using paired recordings. These recent findings would be vital to incorporate into the model instead of using results obtained in the hippocampus and neocortex. I am not sure that a realistic model can be achieved by excluding many interneuron types.

      The interneurons and connectivity that we used were inspired by the functional connectivity reported in (Krabbe et al., 2019) (see above answer to Reviewer #1). As reported in (Vereczki et al., 2021), there are multiple categories and subcategories of interneurons; that paper does not report on which ones are essential for fear conditioning. We did use all the highly represented categories of the interneurons, except NPY-containing neurogliaform cells.

      The Reviewer says “I am not sure that a realistic model can be achieved by excluding many interneuron types”. We agree with the Reviewer that discarding the introduction of other interneurons subtypes and the description of more specific connectivity (soma-, dendrite-, and axon-targeting connections) may limit the ability of our model to describe all the details in the BLA. However, this work represents a first effort towards a biophysically detailed description of the BLA rhythms and their function. As in any modeling approach, assumptions about what to describe and test are determined by the scientific question; details postulated to be less relevant are omitted to obtain clarity. The interneuron subtypes we modeled, especially VIP+ and PV+, have been reported to have a crucial role in fear conditioning (Krabbe et al., 2019). Other interneurons, e.g. cholecystokinin and SOM+, have been suggested as essential in fear extinction. Thus, in the follow-up of this work to explain fear extinction, we will introduce other cell types and connectivity. In the current work, we have achieved our goals of explaining the origin of the experimentally found rhythms and their roles in the production of plasticity underlying fear learning. Of course, a more detailed model may reveal flaws in this explanation, but this is science that has not been yet done.

      4) The authors set the reversal potential of GABA-A receptor-mediated currents to -80 mV. What was the rationale for choosing this value? The reversal potential of IPSCs has been found to be -54 mV in fast-spiking (i.e., parvalbumin) interneurons and around -72 mV in principal cells (Martina et al., 2001, Veres et al., 2017).

      A GABA-A reversal potential around -80 mV is common in the modeling literature (Jensen et al., 2005; Traub et al., 2005; Kumar et al., 2011; Chartove et al., 2020). Other computational works of the amygdala, e.g. (Kim et al., 2016), consider GABA-A reversal potential at -75 mV based on the cortex (Durstewitz et al., 2000). The papers cited by the reviewer have a GABA-A reversal potential of -72 mV for synapses onto pyramidal cells; this is sufficiently close to our model that it is not likely to make a difference. For synapses onto PV+ cells, the papers cited by the reviewer suggest that the GABA-A reversal potential is -54 mV; such a reversal potential would lead these synapses to be excitatory instead of inhibitory. However, it is known (Krabbe et al., 2019; Supp. Fig. 4b) that such synapses are in fact inhibitory. Thus, we wonder if the measurements of Martina and Veres were made in a condition very different from that of Krabbe. For all these reasons, we consider a GABA-A reversal potential around -80 mV in amygdala to be a reasonable assumption. We will discuss these points in our revision.

      5) Proposing neuropeptide VIP as a key factor for learning is interesting. Though, it is not clear why this peptide is more important in fear learning in comparison to SST and CCK, which are also abundant in the BLA and can effectively regulate the circuit operation in cortical areas.

      We do not think that VIP is necessarily more fundamental in fear learning, and certainly not for fear extinction. We will make this clear in the revision.

      We thank Reviewer #3 for their comments and for recognizing that we achieved our modeling aims. We reply to the criticisms below.

      Weaknesses:

      The main weakness of the approach is the lack of experimental data from the BLA to constrain the biophysical models. This forces the authors to use models based on other brain regions and leaves open the question of whether the model really faithfully represents the basolateral amygdala circuitry. Furthermore, the authors chose to use model neurons without a representation of the morphology. However, given that PV+ and SOM+ cells are known to preferentially target different parts of pyramidal cells and given that the model relies on a strong inhibition form SOM to silence pyramidal cells, the question arises whether SOM inhibition at the apical dendrite in a model representing pyramidal cell morphology would still be sufficient to provide enough inhibition to silence pyramidal firing. Lastly, the fear learning relies on the presentation of the unconditioned stimulus over a long period of time (40 seconds). The authors justify this long-lasting input as reflecting not only the stimulus itself but as a memory of the US that is present over this extended time period. However, the experimental evidence for this presented in the paper is only very weak.

      Many of these issues were addressed in the previous responses.

      1) Our neurons were constrained by electrophysiology properties in response to hyperpolarizing currents in the BLA (Sosulina et al., 2010). We choose the specific currents, known to be present in these neurons, to replicate those responses.

      2) Though a much more detailed description of BLA interneurons was given in (Vereczki et al., 2021), it is not clear that this level of detail is relevant to the questions that we were asking, especially since the experiments described were not done in the context of fear learning.

      3) It is true that we did not include the morphology, which undoubtedly makes a difference to some aspects of the circuit dynamics. As we described above, modeling requires the omission of many details to bring out the significance of other details.

      4) As described above, some form of memory or overlap in the activity of the excitatory projection neurons is necessary for spike-timing-dependent plasticity. In modeling, one must be specific about hypotheses, and describe why they are plausible, if not proved; indeed, modeling can explain known phenomena by showing how they are consequences of some (plausible) hypotheses, which themselves are open to experimental verification.

      5) The 40 seconds is not necessary if there are multiple presentations.

      Other critiques:

      1) It is correct that PV+ and SOM+ preferentially target different parts of excitatory projection neurons and that the model relies on a strong inhibition from SOM+ and PV+ to silence the excitatory projection neurons. This choice of parameters comes from using simplified models: it is standard in modeling to adjust parameters to compensate for simplifications.

      2) The SOM+ inhibition of the pyramidal cell firing can be seen as a hypothesis of our model. It is well known that VIP+ cells disinhibit pyramidal cells through inhibition of SOM+ and PV+ cells, which is all we are using in our model; hence this hypothesis is generally believed.

      The authors achieved the aim of constructing a biophysically detailed model of the BLA not only capable of fear learning but also showing spectral signatures seen in vivo. The presented results support the conclusions with the exception of a potential alternative circuit mechanism demonstrating fear learning based on a classical Hebbian (i.e. non-depression-dominated) plasticity rule, which would not require the intricate interplay between the inhibitory interneurons. This alternative circuit is mentioned but a more detailed comparison between it and the proposed circuitry is warranted.

      We agree with the reviewer that it would be good to have a more detailed comparison with the classical Hebbian rule (non-depression-dominated rule). However, we demonstrated in Supplementary Materials that the non-depression-dominated rule is less robust and only operates within a limited window of PV+ excitation. We will have a more robust discussion of plasticity in the revision.

    1. Author Response

      We would like to thank the reviewers for their careful reading of the manuscript and for the positive feedback and constructive criticism that they have provided. We intend to incorporate this feedback into an improved and updated version of the manuscript. We will address the reviewer comments point by point when we submit an updated version but for now we would like to discuss the major points that we intend to address.

      The first concern raised by the reviewers related to the specificity of the BDNF and TrkB staining. We agree that this is an important concern. We tested several antibodies and staining protocols and found that the optimal protocol involved the antibody used in this paper (abcam ab108319), in combination with a heat induced epitope retrieval (HIER) step. Together, this gave robust staining of BDNF in cerebellar tissue and the results of quantification of the staining were in agreement with a BDNF ELISA that we carried out to measure levels of BDNF in the cerebellar vermis of WT and SCA6 mice (Cook et al., 2022). We outline the epitope retrieval method briefly in the methods section of this manuscript but in a revised version we will include further details and data showing the troubleshooting and validation experiments that we have conducted.

      Another concern raised by the reviewers is that 7,8-DHF may not be acting as a TrkB agonist. There has been controversy over the mechanism of action of 7,8-DHF and we welcome the opportunity to discuss the issue further in the present manuscript. We have some evidence that 7,8-DHF is acting via TrkB in this case, as we had previously shown that 7,8-DHF administration to SCA6 mice leads to increased cerebellar TrkB levels and phosphorylation of Akt, an activation event known to be downstream of TrkB (Cook et al., 2022). This implicates TrkB in the mechanism of rescue in this case, but we have not demonstrated this directly. We acknowledge that 7,8-DHF could be acting via a different mechanism, such as anti-oxidant or anti-inflammatory effects. This would be interesting and could be followed up on in the future, potentially providing further insights into the pathophysiology of SCA6. We plan to revise the manuscript and provide additional discussion of the potential mechanism of action of 7,8-DHF. Despite this uncertainty, we believe that the finding that 7,8-DHF rescues early endosome abnormalities is a valuable addition to the paper. Whatever the mechanism of 7,8-DHF, this compound holds promise for potential treatment of SCA6.

      With further staining experiments and addition of information to the text, we feel confident that we can address the concerns of the reviewers and that an updated version will strengthen our manuscript and thereby provide valuable insight into the pathophysiology and potential treatment of SCA6.

    1. Author Response

      We thank reviewers for their evaluation of our work and their thorough critiques, which we will address in an upcoming revised version of the manuscript. We note that work on mouse and fish CIB knockouts in our laboratories started over a decade ago and our discoveries are contemporary to those recently presented by Liang et al., 2021 and Wang et al., 2023, which we acknowledge, cite, and give credit as appropriate. We also note that work on fish knockouts and on fish Cib3 is completely novel.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      The authors report the first use of the bacterial Tus-Ter replication block system in human cells. A single plasmid containing two divergently oriented five-fold TerB repeats was integrated on chromosome 12 of MCF7 cells. ChIP and PLA experiments convincingly demonstrate the occupancy of Tus at the Ter sites in cells. Using an elegant Single Molecule Analysis of Replicated DNA (SMARD) assay, convincing data demonstrate the replication block at Ter sites dependent on the presence of the protein. As an orthogonal method to demonstrate fork stalling, ChIP data show the accumulation of the replicative helicase component MCM3 and the repair protein FANCM around the Ter sites. It is unclear whether the Ter sites integrated by a single copy plasmid have any effect on the replication of this region but the data show that the observed effects are dependent on expression of the Tus protein. The SMARD data do not reveal what proportion of forks are arrested at Tus/Ter, or how long the fork delay is imposed. Fork stalling led to a highly localized gammaH2AX response, as monitored by ChIP using primer pairs spread along the integrated plasmid carrying the Ter sites. This response was shown to be dependent on ATR using the ATR inhibitor VE-822. This contrasts with a single Cas9-induced DSB between the two Ter sites, which causes a more spread gammaH2AX response. While this was monitored only at a single distal site, the difference between the DSB and the Tus-induced stall is very significant. Interestingly, despite evidence for ATR activation through the gammaH2AX response, no evidence for phosphorylation of ATR-T1989, CHK1-S345, or RPA2-S33 could be found under fork stalling conditions. The global replication inhibitor hydroxyurea (HU) elicited phosphorylation of ATR-T1989, CHK1-S345, or RPA2-S33. In this context, it would have been of interest to examine if a single DSB in the Ter region leads to phosphorylation of ATR-T1989, CHK1-S345, or RPA2-S33 and cell cycle arrest. It is not shown whether the replication inhibitor HU leads to the same widely spread gamma H2AX response. Overall, this is a well written manuscript, and the data provide convincing evidence that the Tus-Ter system poses a site-specific replication fork block in MCF7 cells leading to a localized ATR-dependent DNA damage checkpoint response that is distinct from the more global response to HU or DSBs.

      Author response to public review:

      “It is unclear whether the Ter sites integrated by a single copy plasmid have any effect on the replication of this region but the data show that the observed effects are dependent on expression of the Tus protein.”

      -The lack of perturbation of the TerB sequence on fork progression has extensively been studied previously in both Willis et al, 2014 and Larsen et. al, 2014. Furthermore, as the detection of the SMARD signal at the TerB sites is dependent on the 7.5kb probe that spans the TerB sites (orange probe, Fig 2B & 2D), it would be impossible to study the effect on replication in this region, with and without the integration of the single copy plasmid.

      “The SMARD data do not reveal what proportion of forks are arrested at Tus/Ter, or how long the fork delay is imposed.”

      -The percentage of fork stalling at the TerB sites, with and without Tus expression, has been quantified in Figure 2E & 2F. Essentially, 36% forks stall at the TerB block, i.e. 18% of the forks stall in both the 5’ to 3’ (orange) and 3’ to 5’ (blue) direction when the Tus-TerB block is active.

      “It is not shown whether the replication inhibitor HU leads to the same widely spread gamma H2AX response.”

      -While we have not shown gH2AX accumulation via ChIP after HU treatment, Supplementary Figure 5A & 5B clearly show increased gH2AX foci when the cells are treated with HU, suggesting a global replication stress response that is in stark contrast to the response to Tus-TerB.

      Recommendations for the authors:

      Lines 78, 95: In the experimental set-up there are two divergent 5-TerB sites in the orientation that is non-permissive for the fork progression notwithstanding the direction. This raises an obvious question: How an intervening (~1kb-long) DNA segment in being replicated? Does it stay under-replicated and then break?

      -The reviewers pose an important question about how the intervening sequence flanked by the two TerB sites is replicated, and if this leads to formation of anaphase bridges resulting in breaks. We think this is very plausible and this very question is part of ongoing studies in the lab with the aim to understand how the cell resolves a site-specific block. Unfortunately, this falls outside the scope of the current study.

      Also, it is unclear what is meant with non-permissive orientation. This depends on the predominant replication direction. As the construct has Ter repeats in opposite orientation, any direction is non-permissive. These descriptions could be rephrased to avoid confusion

      -The text has been edited to clarify this.

      Fig 1A: It would be helpful to annotate the map to show the position of each primer relative to the Ter array. Why is there no signal for pp52?

      -Figure 1A has the map of the locus with the annotated primer pairs and their relative positions to the TerB array.

      -pp52 is positioned beyond the TerB array so binding of the Tus-His protein there is unlikely, confirming the specificity of the Tus binding to only the TerB array and not to the adjacent chromatin.

      Figure 1B: Change Tus to Tus-His to make it easier to understand that the anti-His ChIP is targeting Tus. Provide information what normalization method was used in the ChIP experiments.

      -Figure 1B has been edited to reflect this change

      Line 113: Willis et al. 2014 also worked with chromosomal Ter sites, which should be acknowledged here.

      The text has been modified to indicate this. We apologize for the oversight.

      Line 126: Define pWB15 and its significance in text.

      -The text has been edited to clarify this and mentions pWB15.

      Figure 2E, F: Define legend (blue, orange boxes and arrow heads).

      -The figure legend corresponding to Figure 2 has a detailed description of the boxes and the arrows.

      Figure 3E, 4C: Add map of primers like in Figures 1 and 2.

      -The map added to Figures 3 & 4 and text updated.

      Figure 4: Showing that the gammaH2AX response is spread like with the single DSB would bolster the conclusion about the difference between a local and global response. Fig 4A, Lane-3: A loading control for the chromatin fraction is missing.

      -Measuring gH2AX chromatin spread after global replication stress can be challenging. We have tried to address the question of global and local gH2AX response post replication stress by quantifying gH2AX foci in cells treated with and without hydroxyurea, comparing it with cells that have a functional Tus-TerB block (Supplementary Figure 5A& 5B). A single fork block seems to only elicit a local response while a global replication stress leads to gH2AX accumulation globally in the cell.

      -Lamin A/C has been added to Fig 4A as a loading control for the chromatin fraction.

      Figure S4: Analyzing ATR, CHK1 and RPA phosphorylation as well as cell cycle profile under single DSB condition may reveal that different localized responses exist. I mention this because it was reported in yeast that a single DSB in G1 cells leads to a similarly localized Mec1 (ATR) -dependent response that does not elicit phosphorylation of Rad53 (CHK1) and other downstream targets, but leads to H2A phosphorylation as well as phosphorylation of RPA and the Rad51 paralog Rad55 (see PMCID: PMC2853130). It might be of interest to the reader to discuss this publication and the commonalities and differences between both localized checkpoint response

      -The reviewers raise an interesting question about the phosphorylation of ATR/CHK1/RPA and its effect on cell cycle after a single DSB. The aim of using the Cas9 break site in this study was merely to corroborate previously published observations pertaining to the spread of gH2AX after a DSB and to contrast that with the local response seen with Tus-TerB. Thus, while an intriguing question, we do not think this particular experiment will help in the understanding of the localized checkpoint response after a single replication fork block. However, we have included the observations previous published in the yeast system (PMC2853130) in our discussion as it helps compare and contrast fork blocks and DSBs further. It is of worth though that the yeast studies were looking at the cellular response to a DSB in G1.

      Lines 256-260: In the discussion of ATRIP, unpublished data are discussed that show no increase in ssDNA. What is the effect of ATRIP depletion? Maybe delete this mention of unpublished data, if no new data can be provided. The authors are aware that this makes the mechanism of ATR activation at the 5-TerB site elusive.

      -This statement has been deleted and the text has been modified.

      Another possibility discussed by the authors is fork reversal. Since Tus/Ter complex block the CMG progression, fork reversal would result in a chicken foot structure with the long single-stranded 3'-overhang of an Okazaki fragment site. Such a structure should be protected by BRCA2 or RAD52 proteins from degradation. Any role for these proteins in the checkpoint activation at the TerB site?

      -The reviewers suggest an interesting scenario where the Tus-TerB block induced reversed fork structure could be protected by the loading of known DNA repair proteins and this in turn could lead to a signaling mechanism and checkpoint activation. While we have not tested this hypothesis, nor studied the temporal dynamics of the formation if the reversed fork with respect to gH2AX accumulation, we think the localized gH2AX signal observed in the vicinity of the block is what initiates the downstream DDR response, promoting fork stabilization, followed either by fork reversal and restart or fork collapse. If the reversed fork was responsible for the gH2AX signaling, one would envision the spread to be more widespread, perhaps decorating the entire stretch of DNA between the block and the reversed fork. However, further studies are warranted to tease out this mechanism and the spatio-temporal dynamics.

      Lines 292-294: The authors state that "unpublished work from our laboratory has demonstrated that replication forks are cleaved at or near the TerB site..." Unless the data are shown, it might be best to eliminate discussion of unpublished work, also because the occurrence of DNA ends at Ter sites was already described in Willis et al. 2017.

      -The statement has been deleted and Willis et al. 2017 has been referenced.

      Suppl Table 1: It would help to also show representative images of stretched fibers in addition to the summary data shown.

      -Since the data is negative, the fiber images do not show any discernible differences and we do not think it adds useful information.

      Suppl Fig 4. ChIP for gamma H2AX data. It would be helpful to show the distribution of the gamma H2AX signal along the chromosome for both the DSB response and the Tus/Ter response.

      -The gH2AX ChIP signal at PP0-2 and PP10 has been included in Supplementary Fig4D. Though not significant for PP0-2, the data strongly suggests that there is increased spread of gH2AX along the chromosome after a DSB, strongly contrasting with the response after Tus-TerB block. The text has been modified to include both primer pairs.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Thank you for sending our manuscript for review and the positive editorial comments. On behalf of all authors, I would like to thank the reviewers for their critical reading of our manuscript and for providing insightful and valuable suggestions. We have revised the discussion section accordingly, including a new supplemental figure to show the results previously stated as “data not shown”. Please see below for detailed explanations.

      Reviewer #1 (Public Review):

      The manuscript by Zheng et al. examined the disease-causing mechanisms of two missense mutations within the homeodomain (HD) of CRX protein. Both mutations were found in humans and can produce severe dominant retinopathy. The authors investigated the two CRX HD mutants via in vitro DNA-binding assay (Spec-seq), in vivo chromatin-binding assay (ChIP-seq), in vivo expression assay of downstream target genes (RNA-seq), and retinal histological and functional assays. They concluded that p.E80A increased the transactivation activity of CRX and resulted in precocious photoreceptor differentiation, whereas p.K88N significantly changed the binding specificity of CRX and led to defects in photoreceptor differentiation and maintenance. The authors performed a significant amount of analyses. The claims are sufficiently supported by the data. The results not only uncovered the underlying disease-causing mechanisms, but also can significantly improve our understanding of the interaction between HD-TF and DNA during development.

      Thank you for summarizing the key findings and strengths of our manuscript.

      Minor concerns:

      1. The E80A, K88N and R90W (previously reported by the same group) mutations are located very close to each other in the homeodomain (Figure 1A), but had distinct effects on the activity of CRX. Has the structure of the homeodomain (of CRX) been resolved? If so, could the authors discuss this phenomenon (mutations close to each other but have distinct effects) based on the HD-DNA structure?

      In paragraphs 2, 4, 5 of the discussion section, we have added explanations on how each mutation could affect CRX HD-DNA interactions differently based on published structural studies. And we further explain how these biochemical changes relate to the molecular perturbations and cellular phenotypes seen in vivo.

      In addition, has this phenomenon been observed in other homeodomain TFs?

      Disease associated missense mutations at residues HD50 (K88) and HD52 (R90) have also been reported in other HD TFs implicated in CNS development (see discussion paragraph 7). Distinctively, different substitutions at CRX E80 residue have been reported in multiple CoRD cases, suggesting its essential role in HD-DNA-mediated regulation during retinal development. These new points are now included in the discussion section.

      2. The authors should briefly summarize the effects/disease-causing-mechanisms of all the reported CRX mutations in the discussion part. The readers can then have a better overview of the topic.

      We have added a concise summary of previously proposed CRX mutation classification scheme, all characterized Crx mutant mouse models and their pathogenic mechanisms. Please see paragraph 9 in the discussion section.

      3. CRX can also function as a pioneer factor (reported by the same group). Would these HD mutations distinctively affect chromatin accessibility (which then leads to ectopic binding on the genome)?

      Prior evidence has demonstrated that regulatory regions for many photoreceptor genes failed to stay accessible upon loss of CRX in the Crx-/- model (PMID: 30068366). It is unclear with the existing data whether CRX could initiate the chromatin remodeling (true pioneering function) of these regions, or it simply maintains the accessibility once these regions became accessible. Future studies comparing epigenomic landscape changes in mutant Crx KI models at various ages can be informative, particularly for the CRX K88N ectopic binding events. Determining how the CRX K88N mutant protein alters chromatin landscape important for photoreceptor fate and/or differentiation during development would shed light on the nature of these ectopic binding events.

      4. The discussion part can be shortened and simplified.

      We have re-written the discussion section to make it concise and to incorporate discussions on mutant CRX HD structures. Please see the revised manuscript.

      Reviewer #2 (Public Review):

      Zheng et al., investigated the molecular and functional mechanisms of two homeodomain missense mutations causing human retinal photoreceptor degeneration diseases in photoreceptor development regulated by the CRX transcription factor. They analyzed the E80A mutation associated with dominant cone-rod dystrophy (CRD) and the K88N mutation associated with dominant Leber Congenital Amaurosis (LCA). The authors found that E80A CRX binds to the same target DNA sites as WT CRX, but the binding specificity of K88N CRX is altered from that of WT in an in vitro assay. They generated Crx(E80A) and Crx(K88N) KI mice and performed ChIP assay and observed that K88N CRX binds to novel genomic regions from the WT-binding sites, while E80A binds to the WT sites. In addition, using the KI mice, they found that E80A and K88N differently affect the expression of Crx target genes. This study is well executed with proper and solid methodologies, and the manuscript is clearly written. This study gives us the insights how single missense CRX mutations lead to different types of human retinal photoreceptor degeneration diseases.

      We greatly appreciate the reviewer’s summary and positive comments.

      While the study has strengths in principle, it has a couple of weaknesses. One is how well E80A KI mice function as a pathological model of dominant CRD, in which cones are mainly first affected, is not clearly shown in this study. More data investigating how cones are affected by performing histological, molecular, and physiological analyses will be helpful and useful. For example, in the Discussion, the authors describe that E80A associates with S-cone opsin promoter results is "data now shown". This data must be presented for the readers. In addition, more molecular insights as to how E80A affects cones will strengthen this study.

      The mouse retina is rod dominant and contains only a small number of cones (3% of all photoreceptors) that are born prenatally. This poses technical challenges to appropriately assess cone- specific changes during disease initiation/progression. We are in the process of developing cellular/molecular tools to investigate how cones are being affected in Crx E80A KI model, but this is beyond the scope of the current study.

      At the same time, we have added a supplemental panel showing that, based on P0 retinal immunostaining of the early cone marker RXRγ, cones were initially born, and fate specified in CrxE80A retinas (see Figure S7A). Since the E80A protein also hyper-activated S-cone opsin promoter-luciferase (Sop-luc) reporter in HEK293 cells (see Figure S7B), we predict that CRX E80A affects cone photoreceptor differentiation in a similar manner as rod photoreceptors. Furthermore, the cone transcriptional program might be more prone to perturbations by abnormal CRX activities. These possibilities require future investigations. For this manuscript, we have included all these points in the discussion section.

      Another point is that it will be very valuable if the authors could show how E80A and K88N differently affect the 3D structure of the CRX homeodomain. Even a simulation model would be valuable.

      Please see our answer to Point 1 of Reviewer #1. In short, we have added in the discussion section our explanations on how each mutation could affect CRX HD-DNA interactions differently based on structural studies. We further explain how these biochemical changes relate to the molecular perturbations and cellular phenotypes seen in vivo. Additionally, since TF-DNA interactions are diverse and dynamic across binding sites with different sequence features and genomic environments, future studies that systematically and quantitatively evaluate CRX transcriptional activity at different regulatory sequences would be important.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      As a minor comment, in page 8, second section, "Previous studies have demonstrated the CRX is activated shortly after cell cycle exit in retinal progenitor cells fated to be photoreceptor.", the authors cited refs 66 and 67, which were in 2105 and 2016. However, this was demonstrated in the paper of J. Neurosci.31(46), 16792-807, 2011, Figure 1. It would be fair for the authors to cite the JN 2011 paper.

      Thanks to the reviewer for the suggested reference, we have added it to the revised manuscript.

    2. Author Response:

      The following is the authors’ response to the previous reviews

      Thank you for sending our revised manuscript for review and the positive editorial comments. On behalf of all authors, I would like to, again, thank the reviewers for their critical reading of our revised manuscript and for providing further suggestions. We have revised the introduction and discussion sections to specifically address the comments made by Reviewer #2. Please see below for detailed explanations.

      Reviewer #2 (Public Review):

      Overall, the authors have significantly improved the manuscript, but there is still an unclarified point. In response to the inquiry in the initial review on how extent E80A KI mice function as a pathological model of dominant CoRD, the authors add data (Figures S7) and described the sixth section in the discussion. However, the authors mentioned that it is technically too challenging because of a small number of cones. The point is not clear to me, but it is possible to analyze cone differentiation and degeneration by immunostaining at multiple stages even though cone number is small. Cone arrestin and S- and M-opsins become positive at early postnatal stages in the mouse retina. Cone arrestin seems earlier than cone opsins. Cones seem born by detecting RXRg at P0, but are cone arrestin and/or cone opsins expressed in early postnatal E80A/+ retina? If positive, how about an apoptosis marker? If negative, it seems to be a cone development phenotype rather than cone degeneration phenotype. If so, authors should modify the expression to say that the E80A retina underlies CoRD-like phenotype. It seems an overstatement.

      We greatly appreciate Reviewer 2’s suggestions on further investigating cone photoreceptor phenotypes in the CRX E80A KI mouse model. All the points raised deserve a comprehensive and in-depth study. However, the focus of the current manuscript is to establish a general framework for understanding different missense mutations in homeodomain TFs beyond CRX. We believe that a separate and dedicated study is more appropriate to detail the quantitative molecular and cellular mechanisms of CRX E80A dysfunction in cone and rod photoreceptors, as stated in the last sentence of discussion section paragraph 6: “… quantitative characterization of CRX E80A molecular functions in a cone dominant retina warrants further study to understand its selective effect on the cone differentiation program and help elucidate WT CRX regulatory principles in early photoreceptor development.”.

      Clinical diagnosis of cone-rod dystrophy (CoRD) is largely based on functional deficits of cones and rods. 1-month electroretinogram (ERG) (Figures 5K-M) shows no cone-mediated light responses and reduced rod functions in CrxE80A/+ mouse. These ERG deficits in the CRX E80A KI mouse model are in agreement with CoRD characteristics. Thus, it is reasonable to say that CRX E80A KI retina phenotype resembles CoRD phenotype.

      Reviewer #2 (Recommendations For The Authors):

      As a minor comment, in page 8, second section, "Previous studies have demonstrated the CRX is activated shortly after cell cycle exit in retinal progenitor cells fated to be photoreceptor.", the authors cited refs 66 and 67, which were in 2105 and 2016. However, it was demonstrated in the paper of J. Neurosci.31(46), 16792-807, 2011, Figure 1. The authors need to be scientifically fair to cite the JN 2011 paper.

      In response to this comment above, the authors cited the JN 2011 paper in a modified sentence of "Animal studies have demonstrated that Crx is first expressed in post-mitotic photoreceptor precursors and maintained throughout life (Refs.13-15)", moved from the discussion to the introduction. To my knowledge, the JN2011 (new Ref 15) is the first study directly demonstrated that Crx begins to be expressed shortly after cell cycle exit of retinal progenitor cells. Refs. 13 and 14 showed Crx expression in adult stage photoreceptors but did not directly demonstrate the Crx expression in post-mitotic photoreceptor precursors. To be scientifically precise, the references should be cited as "Animal studies have demonstrated that Crx is first expressed in post-mitotic photoreceptor precursors (Ref. 15) and maintained throughout life (Refs.13 and 14)".” Thanks to the reviewer for the precise instruction. We have adjusted the reference order as follows: “Animal studies have demonstrated that Crx is first expressed in post-mitotic photoreceptor precursors13 and maintained throughout life14,15.”, where JN2011 paper is reference 13.

    1. Author Response:

      We thank the reviewers for their thoughtful reviews. We believe that we can address these comments through revisions within the manuscript (writing/analysis) or as matters of clarification. In this preliminary response, we focus on a few aspects of the reviewer comments.

      Experimental design

      We will ensure that the rationale for our use of 10-minute analytic periods is clear. These time periods were dictated by the sampling duration required to perform accurate neurochemical analyses (and to reserve half of the sample in the event of a catastrophic failure of batch-processing samples). Since neurochemical release may display multiple temporal components (e.g., ACh) during playback stimulation, and these could differ across neurochemicals of interest, we decided to collect, analyze, and report in two periods. Our results suggest that this was appropriate, comparing values across the two stimulus periods and the pre-stimulus control. We decided not to include analyses of the post-stimulus period because this is subject to wider individual and neuromodulator-specific effects and because it weakens statistical power in addressing the core question—the change in neuromodulator release DURING vocal playback.

      We called these periods “Stim 1” and “Stim 2”, but each used the same examplar sequences in the same order.

      For behavioral analyses, observation periods were much shorter than 10 mins, but the main purpose of behavioral analyses was to relate to the neurochemical data. As a result, we matched the temporal features of the behavioral and neurochemical analyses. We will ensure that this is clearly described in the revision. We plan a separate report, focused exclusively on a broader set of behavioral responses to playback, that may examine behaviors at a more granular level.

      One reviewer expressed concern that we did not utilize a “control” playback stimulus, suggesting white noise as the control. We gave extensive consideration to this in our design. We concluded, based on our previous work, that white noise is not a neutral stimulus and therefore the results would not clarify the responses to the two vocal stimuli. Instead, we opted to use experience as a type of control. This control shows very clearly that temporal patterns and across-group differences in neurochemical response disappear in the absence of experience.

      One reviewer comments that our p90-p180 mice are “old”. This is not the case. CBA/CaJ mice display normal hearing for at least 1 year (Ohlemiller, Dahl, and Gagnon, JARO 11: 605-623, 2010) and adult sexual and social behavior throughout our observation period. They are sexually mature adults, appropriate for this study.

      Data and statistical analyses

      Two reviewers express concerns about our normalization of neurochemical data, suggesting that it diminishes statistical power or is not transparent. We note that normalization is a very common form of data transformation that does not diminish statistical power. It is particularly useful for data forms in which the absolute value of the measurement across experiments may be uninformative. Normalization is routine in microdialysis studies, because data can be affected by probe placement and factors affecting neurochemical processing. Similar to calcium imaging or many electrophysiological recordings, the information is based on a comparison to baseline values. We will consider supplying concentration values in supplemental material.

      Two reviewers comment on correlations we presented, with different perspectives. We will review our correlation analyses to determine if these are appropriate and what should be reported.

      Although Reviewer 2 raises several valid issues that we will address in our response and revision, we believe that none represent “major flaws” in the study that challenge the validity of our central conclusions. In brief, we will: * provide enhanced description of behaviors * clarify or modify box-plot representations of data * point to our methods that describe corrections for multiple comparisons * clarify sample size concerns * address questions of correlation between neurochemicals and behavior

      Factual Corrections

      Two reviewer comments and an associated editorial comment suggest that statistical power is lacking. The reviewer comments are incorrect. If the editorial suggestion is based on those comments, we challenge that as well.

      Reviewer 1 states that normalization “creates a baseline period with minimal variation…that could inflate statistical power.” We believe that this statement is incorrect. We will justify elsewhere the rationale for using normalized neurochemical data, but the suggestion that this very common transformation alters statistical power is unwarranted.

      Reviewer 2 states, in the 4th Recommendation for the Authors, that sample sizes are too small. The reviewer gives examples of sample sizes of 3, but that is incorrect. In revising figures, we will ensure that sample numbers appear clearly, but the reviewer’s claim that we used sample size of 3 is not correct. The minimum sample size is 5.

      If these reviewer comments are the bases for the editorial recommendation that the manuscript may require additional power, we believe the recommendation is based on incorrect comments.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The study by Fang et al. reports a 3D MERFISH method that enables spatial transcriptomics for tissues up to 200um in thickness. MERFISH, as well as other spatial transcriptomics technologies, have been mainly used for thin (e.g, 10um) tissue slices, which limits the dimension of spatial transcriptomics technique. Therefore, expanding the capacity of MERFISH to thick tissues represents a major technical advance to enable 3D spatial transcriptomics. Here the authors provide detailed technical descriptions of the new method, troubleshooting, optimization, and application examples to demonstrate its technical capacity, accuracy, sensitivity, and utility. The method will likely have a major impact on future spatial transcriptomics studies to benefit diverse biomedical fields.

      Strengths:

      The study was well-designed, executed, and presented. Extensive protocol optimization and quality assessments were carried out and conclusions are well supported by the data. The methods were sufficiently detailed and the results are solid and compelling.

      We thank the reviewer for the positive comments on our manuscript.

      Weaknesses:

      The biological application examples were limited to cell type/subtype classification in two brain regions. Additional examples of how the data could be used to address important biological questions will enhance the impact of the study.

      We appreciate the reviewer's suggestion that demonstrating the applications of our thick-tissue 3D MERFISH method to addressing important biological questions would enhance the impact of our study. In line with this reviewer comment, we had included examples of how our method could be applied to address various biological questions in the summary (last) paragraph of our manuscript. These examples highlight the versatility and utility of our approach in addressing diverse biological questions beyond cell type classification. However, the goal of this work is to develop a new method and establish its validity. While we are interested in applying it to answer important biological questions in the future, we consider these applications beyond the scope of this current work.

      Reviewer #2 (Public Review):

      Summary:

      In their preprint, Fang et al present data on extending a spatial transcriptomics method, MERFISH, to 3D using a spinning disc confocal. MERFISH is a well-established method, first published by Zhuang's lab in 2015 with multiple follow-up papers. In the last few years, MERFISH has been used by multiple groups working on spatial transcriptomics, including approximately 12 million cell maps measured in the mouse brain atlas project. Variants of MERFISH were used to map epigenetic information complementary to gene expression and RNA abundance. However, MERFISH was always limited to thin ~10um sections to this date. The key contribution of this work by Fang et al. was to perform the optimization required to get MERFISH working in thick (100-200um) tissue sections.

      Major strengths and weaknesses:

      Overall the paper presents a technical milestone, the ability to perform highly multiplexed RNA measurements in 3D using MERFISH protocol. This is not the first spatial transcriptomics done in thick sections. Wang et al. 2018 - StarMAP used thick sections (150 um), and recently, Wang 2021 (EASI-FISH, not cited) performed serial HCR FISH on 300um sections. Data so far suggest that MERFISH has better sensitivity than in situ sequencing approaches (StarMAP) and has built-in multiplexing that EASI-FISH lacks. Therefore, while there is an innovation in the current work, i.e., it is a technically challenging task, the novelty, and overall contribution are modest compared to recently published work.

      This summary is elaborated in more details in the following paragraphs, and we will address these detailed comments below.

      The authors could improve the writing and the manuscript text that places their work in the right context of other spatial transcriptomics work. Out of the 25 citations, 12 are for previous MERFISH work by Zhuang's lab, and only one manuscript used a spatial transcriptomics approach that is not MERFISH. Furthermore, even this paper (Wang et al, 2018) is only discussed in the context of neuroanatomy findings. The fact that Wang et al. were the first to measure thick sections is not mentioned in the manuscript. The work by Wang et al. 2021 (EASI-FISH) is not cited at all, as well as the many other multiplexed FISH papers published in recent years that are very relevant. For example, a key difference between seqFISH+ and MERFISH was the fact that only seqFISH+ used a confocal microscope, and MERFISH has always been relying on epi. As this is the first MERFISH publication to use confocal, I expect citations to previous work in seqFISH and better discussions about differences.

      We thank the reviewer for recognizing our work as a technical milestone. Since this work is aimed to build upon the strengths of MERFISH and address some of its limitations, we primarily cited previous MERFISH papers to make it clear what specific improvements have been achieved in this work. Given the rapid growth of the spatial genomics field, it has become impractical to comprehensively cite all method development or improvement papers in this area. Instead, we cited a 2021 review article in the first sentence of the manuscript and limited all discussions afterwards to MERFISH. In the revised manuscript, we will try to find and include more recent review articles to cover method developments since 2021.

      Although we presented our work as an advance in MERFISH specifically, we consider the reviewer’s suggestion of citing the 2018 STARmap paper [Wang et al., Science 361, eaat5961 (2018)] in the introduction part of our manuscript reasonable. This STARmap paper was already cited in the results part of our manuscript, and we will further emphasize this paper in the introduction of our revised manuscript, as this 2018 in situ sequencing paper was the first to demonstrate 3D spatial transcriptomic profiling in thick tissues. In addition, we thank the reviewer for bringing to our attention the EASI-FISH paper [Wang et al, Cell 184, 6361-6377 (2021)], which reported a method for thick-tissue FISH imaging and demonstrated imaging of 24 genes using multiple rounds of multi-color FISH imaging. We also recently became aware of a paper reporting 3D imaging of thick samples using PHYTOMap [Nobori et al, Nature Plants 9, 1026-1033 (2023)]. This paper, published a few days after we submitted our manuscript to eLife, demonstrated imaging of 28 genes in thick plant samples using multiple rounds of multicolor FISH and probe targeting and amplification methods previously developed for in situ sequencing. We will include these three papers in the introduction section of our revised manuscript.

      However, we do not consider our use of confocal imaging in this work an advance in MERFISH because confocal, like epi-fluorescence imaging, is a commonly used approach that could be applied to MERFISH of thin tissues directly without any alteration of the protocol. Confocal imaging has been broadly used for both DNA and RNA FISH long before any genome-scale imaging was reported. Confocal and epi-imaging geometries have their distinct advantages, and which of these imaging geometries to use is the researcher’s choice depending on instrument availability and experimental needs. Thus, we do not find it necessary to cite specific papers just for using confocal imaging in spatial transcriptomic profiling, but we will see whether it is reasonable to cite these papers in the revised manuscript. Our real advance related to confocal imaging is the use of machine-learning to increase the imaging speed. Without this improvement, 3D imaging of thick tissue using confocal would take a long time and likely degrade image quality due to photobleaching of out-of-focus fluorophores before they are imaged. We thus cited several papers that used deep learning to improve imaging quality and/or speed. Our unique contribution is the combination of machine learning with confocal imaging for 3D multiplexed FISH imaging of thick tissue samples, which had not been demonstrated previously.

      To get MERFISH working in 3D, the authors solved a few technical problems. To address reduced signal-to-noise due to thick samples, Fang et al. used non-linear filtering (i.e., deep learning) to enhance the spots before detection. To improve registrations, the authors identified an issue specific to their Z-Piezo that could be improved and replaced with a better model. Finally, the author used water immersion objectives to mitigate optical aberrations. All these optimization steps are reasonable and make sense. In some cases, I can see the general appeal (another demonstration of deep learning to reduce exposure time). Still, in other cases, the issue is not necessarily general enough (i.e., a different model of Piezo Z stage) to be of interest to a broad readership. There were a few additional optimization steps, i.e., testing four concentrations of readout and encoder probes. So while the preprint describes a technical milestone, achieving this milestone was done with overall modest innovation.

      We appreciate the reviewer's recognition of the technical challenges we have overcome in developing this 3D thick-tissue MERFISH method. To achieve high-quality thicktissue MERFISH imaging, we had to overcome multiple different challenges. We agree with the reviewer that the solutions to some of the above challenges are intellectually more impressive than the others that required relatively more mundane efforts. However, all of these are needed to achieve the overall goal, a goal that is considered a milestone by the reviewer. We believe that the impact of a method should be evaluated based on its unique capabilities, potential applications, and its adaptability for broader adoption. In this regard, we anticipate that our reported method will be a valuable and impactful contribution to the field of spatial biology.

      Data and code sharing - the only link in the preprint related to data sharing sends readers to a deleted Dropbox folder. Similarly, the GitHub link is a 404 error. Both are unacceptable. The author should do a better job sharing their raw and processed data. Furthermore, the software shared should not be just the MERlin package used to analyze but the specific code used in that package.

      We apologize for the invalid Dropbox link. The Dropbox folder got accidentally moved and hence the link provided in the manuscript is no longer linked to the folder. The valid link is now: https://www.dropbox.com/scl/fo/ribx45fnx4zw7kv12sl3w/h?rlkey=fo829wbxmb9mwl6gzivg7vqj3 &dl=0. We will also upload the data to a public data repository when submitting the revised manuscript.

      The GitHub link that we provided for the MERlin package is, however, valid and will lead to the correct GitHub site. If, for some reason, clicking the link does not work on your computer, copying the URL address into a web browser should work. Following the suggestion by the reviewer, in addition to the MERlin v2.2.7 package itself, we will also share the specific code to use this package for analyzing the data taken in this work in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Comment 1. The authors used a meta-mask based on previous LC structural studies to delineate the LC on functional scans within two large public datasets (3T CamCAN and 7T HCP).

      The rostral part of the LC was characterized by connections to the posterior and anterior cingulate cortices, medial temporal lobe, hippocampus, amygdala and striatum, while the caudal part projected to the parietal cortex, occipital cortex, precentral and postcentral regions, and thalamus. Older ages were associated with less rostral-like connectivity and increased asymmetry. The gradient explained variance above the effects of age, sex and education on some emotional and cognitive measures. In particular, the old-like functional gradient (loss of rostral-like connectivity and more clustered functional organization) was associated with worse performance on emotional memory and emotion regulation tasks but not to executive functioning or self-rated sleep quality.

      Participants with higher anxiety and depression also showed less rostral-like connectivity and more asymmetry. Both the aging and the anxiety/depression asymmetry manifested as less rostral-like connectivity in the left LC than the right LC.

      A strength of this study is that it is the first to attempt a voxel-based approach to quantifying functional connectivity in the LC. The results finding differences between rostral and caudal LC connectivity patterns are broadly consistent with prior work indicating differences between rostral/caudal LC and should help advance understanding of the LC's connectivity patterns with cortical regions.

      We thank the reviewer for the thorough and positive assessment of our manuscript.

      Comment 2. A limitation of the study is the challenge of assessing activity not only from the small LC brainstem nucleus but also within it. Given the current spatial limitations of whole-brain functional imaging, the current findings are bolstered by including the 7T 1.6mm isotropic data. Spatial smoothing was applied with a 3mm FWHM isotropic kernel which may have reduced precision.

      The reviewer raises valid points. Spatial resolution is indeed a limiting factor for assessing the LC with functional MRI. The choice of including spatial smoothing in the preprocessing was necessary because connectopic mapping requires a measure of spatial smoothness for the gradient calculation (see Haak et al. 2018 NeuroImage). We included a sentence explaining this in the revised version of the manuscript and added it also as an additional limitation.

      Comment 3. Another limitation was that the authors made conclusions about clustered functional organization but it was not clear how clustering was quantified.

      We thank the reviewer for the comment. Clusterability was quantified in the following way (based on Ngo et al. 2021 NeuroImage). First, gradient maps were clustered into k=2 clusters using the k-means clustering algorithm and then the Calinski-Harabasz criterion was calculated for each individual gradient map, which was used as a measure of clusterability. Higher criterion values were significantly associated with older age (Spearman’s rho = 0.3129, p<0.0089), indicating that the gradient was more clustered in older individuals. This new analysis is now included in the revised version of the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Comment 1. Would it be equally accurate to state that participants with higher anxiety and depression showed more caudal-like connectivity or are the differences clearly localized to the rostral LC?

      We thank the reviewer for the question. Since the gradients are by default a dimensionless scale that was further normalized to the range of 0-1, both interpretations are possible. We hypothesized a loss of rostral-like LC connectivity based on previous literature.

      Comment 2. These resting-state findings seem to show some interesting parallels to the structural rostral/caudal LC MRI contrast relationships with cortical thickness in Bachman et al. (2021, Neurobiology of Aging), who found that positive associations between LC contrast and structural thickness were found among older adults for rostral but not caudal LC (corresponding with the rostral regions showing the most age-related change). It is also interesting that in Bachman et al., younger adults showed negative correlations between caudal LC contrast and cortical thickness, which may relate to associations with a more caudal-like connectivity pattern (assuming this is a fair way to interpret the current results) in those with high HADS scores (i.e., rostral LC indicators may reflect stress/anxiety).

      We thank the reviewer for pointing out these interesting findings. We included the reference in the revised version of the manuscript.

      Comment 3. How was "more clustered functional organization" computed? I could not find description of this in the analysis section. If it is something that is evident from the visual depiction of the surface rendering shown in Fig. 2, please explain as it was not clear to me.

      We thank the reviewer for the comment. As mentioned in a previous answer to a comment made by the Reviewer, clusterability was quantified in the following way (based on Ngo et al. 2021 NeuroImage). Gradient maps were clustered into k=2 clusters using the k-means clustering algorithm, then the Calinski-Harabasz criterion was calculated for each individual gradient map, which was then used as a measure of clusterability. Higher criterion values were significantly associated with older age (Spearman’s rho = 0.3129, p<0.0089), indicating that the gradient was more clustered in older individuals. This analysis is now included in the revised version of the manuscript.

      Comment 4. In the connectopic mapping methods, it is stated that the analysis starts by calculating functional connectivity matrices between all voxel time series in an ROI and time series from a target mask. That statement sounds as though there would be one time series from the overall target mask. It is then stated that the target mask consistent of brain areas from a cortical and subcortical parcellation. But it is not clarified if (as I assume was the case) time series were extracted for each parcel within the mask (and how many parcels there were - 180?).

      We thank the reviewer for the helpful comment. Regarding the target mask, average time series were extracted from each parcel in the atlases separately, and then pairwise correlations were calculated with timeseries from all voxels in the ROI (the LC). We used the Glasser-atlas (which contains 360 parcels) as a cortical parcelllation and the Tian-atlas (which contains 50 parcels) as a subcortical parcellation. The corresponding section of the manuscript now includes this clarification.

      Comment 5. Then it is stated that "Afterwards, we obtained a similarity matrix from the functional connectivity matrices of LC ROI voxels by calculating the eta-squared measure." It would help here to explain a little more to clarify which things are being compared for similarity. Specifically, for which pairs was the eta-squared measure computed for?

      The eta-squared measure was calculated between the functional connectivity profiles (or “fingerprints”) for all pairs of voxels in the LC ROI. More specifically, one such fingerprint contains the Pearson correlation coefficients between a given LC voxel time series and the regional time series from the target mask. The similarity matrix contains the eta-squared similarity of these fingerprints, therefore one index in the similarity matrix contains the similarity between the fingerprints of two specific LC voxels. The corresponding section of the manuscript now includes this clarification.

      Comment 6. In Fig. 3, I found the labeling of surface renderings confusing (i.e., did high->low apply to both rows? What about 'emotional memory? do the top and bottom row correspond with the R/L LC?).

      We thank the reviewer for the helpful comment and made some changes to Fig. 3 to clarify the labels. The upper row shows the right LC, whereas the bottom row shows the left LC. High->low and low->high applies to both rows. Regarding emotional memory, a worse performance on this task resulted in lower scores. With emotional reactivity, higher scores indicate a worse ability to regulate negative ratings on the task, which results in an inverse relationship of this score with the LC gradient features. We also extended the figure label to include this explanation.

      Response to Reviewer #2 (Public Review):

      Comment 1. One of the major strengths in the current study is the implementation of the fully data-driven, gradient-based method for mapping connectopies of the LC. This approach is especially suited for brain structures that are difficult to localise because the resulted connectopic mapping is relatively robust to ROI definition (Fig. 7 in Haak et al., 2018). However, as a very inclusive definition of the LC (the "meta atlas") was adopted in the study, to what extent the gradient approach can tolerate changes of accuracy and specificity for LC ROI definition is unknown. Some comparative analyses would be helpful to provide assessments on the specificity and stability of the reported gradient pattern.

      We thank the reviewer for the positive assessment of our manuscript. Indeed, an advantage of the connectopic mapping approach is that it is less sensitive to minor ROI inaccuracies, which is convenient for the LC. We repeated the gradient calculation using a larger LC mask from Tona et al. (2017), and included a supplementary figure (Figure S3) that shows how the gradients still retain their rostrocaudal pattern using both LC masks.

      Comment 2. Haak et al. showed distinct reproducibility within and between subjects when comparing connectopic mappings between M1 and V1. M1 connectopic mapping showed very high consistency across subjects (ICCs > 0.9) compared with V1. This is very reasonable because the functional organisation within M1 is relatively homogeneous. Regarding the reliability of the LC rostro-caudal gradient, the authors only stated that "individual gradient estimation is often not consistent", but direct measurement on the consistency across subjects for the LC gradient was missing. This is important for future LC fMRI studies as more consistent pattern might warrant the application of an atlas-based method otherwise a more individualized pipeline is needed for investigating functional dissociation in LC subregions.

      We thank the reviewer for the question. Indeed, investigating the replicability of gradients at the individual level is important. However, regarding the LC, because of the ROI size and the relative shortness of the scans in the Cam-CAN dataset, we did not calculate individual level gradients and resorted to a group-level approach as we described in the method. Therefore, the assessment of individual reliability was outside of the scope of the current study. We included this as a limitation in the Discussion of the revised manuscript.

      Comment 3. It puzzles me that why a dichotomous rostral vs caudal comparison was used to demonstrate the difference in connectivity patterns along the rostro-caudal gradient which might be an oversimplistic approach as described by the authors themselves? In fact, it might be more interesting to include the central "core" LC which is structurally organized in high density (Fernandes et al., 2012) and functionally distinguishable to the peri-LC "shell" region (Totah et al., 2018; Poe et al., 2022).

      We thank the reviewer for the comment. Indeed, during the analyses we tried to delineate a central core region within the LC, however, the functional connections in this region varied greatly between individuals and we failed to reliably detect a functionally distinct central core region using FC. One reason for this might unfortunately be the limited spatial resolution of functional MRI. Instead, we hypothesized that the gradient manifests in fMRI connectivity of the LC by a gradual transition of connectivity profiles between the two dominant extremes of the caudal and rostral LC and we aimed to depict these two extremes in Figure 1. Although it is a simpler approach compared to the results of histological studies, we demonstrate in the paper that it still provides valuable information about LC in aging and LC-related behavioral measures.

      Comment 4. The composition of rostral vs caudal connectivity pattern changes over ageing, where the loss of rostral-like connectivity was consistent in bilateral LC whereas the gain of caudal-like connectivity in older subjects was only evident in the left LC. Do authors have any explanations on this left-lateralised ageing effect which is interestingly coincided with a lot of observations such as increased left LC contrast ratios was found during ageing (Betts et al., 2017) and in PD patients (Ye et al., 2022), reduced left LC-parahippocampal gyrus connectivity was reported in aMCI patients (Jacobs et al., 2015).

      We thank the reviewer for the question. Indeed, we observed lateralized changes in the LC gradients both in connection with aging and cognitive performance. Generally, the LC connects to several highly lateralized cortical networks, e.g. the salience and frontoparietal networks, which might result in an asymmetric plasticity in the LC. Interestingly, neurodegenerative disorders seem to affect the left LC more, e.g. more widespread loss of connectivity between the left LC and resting state networks was found in PD patients, with a correlation between left LC-executive control network connectivity and cognition (Sun et al. 2023). However, the biological basis for this is elusive, as post-mortem studies generally find the bilateral LC symmetric and mostly report pathological changes in the rostral and middle LC (Beardmore et al. 2021). In our case, a possible interpretation is that with the loss of rostral-like connectivity or previously rostral-like areas lose their specific connections and become more similar to the caudal part in terms of connectivity. In our study, since we did not investigate the cerebellum and the spinal cord, the typical caudal connectivity profile is more non-specific, since some of its dominant connections are not assessed. This interpretation is now included in the revised version of the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Comment 1. Minor:

      • the preprocessing pipeline for HCP 7T data was not reported.

      We extended the details of the preprocessing pipeline for the HCP 7T dataset.

      Comment 2. - a difference map would be useful to demonstrate the similarity of LC connectivity gradient between CamCAN and HCP dataset.

      We have now added a difference map between the CamCAN and HCP gradient in the supplementary material (Figure S2).

      Comment 3. - labels for left and right LC were missing in Fig 3.

      We corrected the labeling in Figure 3.

      Comment 4. - in Statistical Analysis, CamCAN participants were divided into two groups with and without depressive and anxiety symptoms. It is unclear whether participants with high HADS scores were presented with both symptoms or just one of them.

      Because of the low number of participants with high depression scores on the HADS test, we defined high HADS scores as individuals scoring above normal on either the anxiety part, the depression part, or both.

    1. Author Respone

      Reviewer #1 (Public Review):

      While there are many models for sequence retrieval, it has been difficult to find models that vary the speed of sequence retrieval dynamically via simple external inputs. While recent works [1,2] have proposed some mechanisms, the authors here propose a different one based on heterogeneous plasticity rules. Temporally symmetric plasticity kernels (that do not distinguish between the order of pre and post spikes, but only their time difference) are expected to give rise to attractor states, asymmetric ones to sequence transitions. The authors incorporate a rate-based, discrete-time analog of these spike-based plasticity rules to learn the connections between neurons (leading to connections similar to Hopfield networks for attractors and sequences). They use either a parametric combination of symmetric and asymmetric learning rules for connections into each neuron, or separate subpopulations having only symmetric or asymmetric learning rules on incoming connections. They find that the latter is conducive to enabling external inputs to control the speed of sequence retrieval.

      Strengths:

      The authors have expertly characterised the system dynamics using both simulations and theory. How the speed and quality of retrieval varies across phases space has been well-studied. The authors are also able to vary the external inputs to reproduce a preparatory followed by an execution phase of sequence retrieval as seen experimentally in motor control. They also propose a simple reinforcement learning scheme for learning to map the two external inputs to the desired retrieval speed.

      Weaknesses:

      1) The authors translate spike-based synaptic plasticity rules to a way to learn/set connections for rate units operating in discrete time, similar to their earlier work in [5]. The bio-plausibility issues of learning in [5] carry over here, for e.g. the authors ignore any input due to the recurrent connectivity during learning and effectively fix the pre and post rates to the desired ones. While the learning itself is not fully bio-plausible, it does lend itself to writing the final connectivity matrix in a manner that is easier to analyze theoretically.

      We agree with the reviewer that learning is not `fully bio-plausible’. However, we believe that extending the results to a model in which synaptic plasticity depends on recurrent inputs is beyond the scope of this work. We will address this issue in the Discussion in a revised manuscript.

      2) While the authors learn to map the set of two external input strengths to speed of retrieval, they still hand-wire one external input to the subpopulation of neurons with temporally symmetric plasticity and the other external input to the other subpopulation with temporally asymmetric plasticity. The authors suggest that these subpopulations might arise due to differences in the parameters of Ca dynamics as in their earlier work [29]. How these two external inputs would connect to neurons differentially based on the plasticity kernel / Ca dynamics parameters of the recurrent connections is still an open question which the authors have not touched upon.

      The issue of how external inputs could self-organize to drive the network to retrieve sequences at appropriate speeds is addressed in the last part of the Results section. We believe this issue is independent from how different forms of synaptic plasticity can be achieved using different parameters that describe how calcium triggers synaptic plasticity. We will discuss these issues more clearly in the revised manuscript.

      3) The authors require that temporally symmetric and asymmetric learning rules be present in the recurrent connections between subpopulations of neurons in the same brain region, i.e. some neurons in the same brain region should have temporally symmetric kernels, while others should have temporally asymmetric ones. The evidence for this seems thin. Though, in the discussion, the authors clarify 'While this heterogeneity has been found so far across structures or across different regions in the same structure, this heterogeneity could also be present within local networks, as current experimental methods for probing plasticity only have access to a single delay between pre and post-synaptic spikes in each recorded neuron, and would therefore miss this heterogeneity'.

      We agree with the reviewer that this is currently an open question. We will describe this issue in more detail in the Discussion of a revised manuscript.

      4) An aspect which the authors have not connected to is one of the author's earlier work: Brunel, N. (2016). Is cortical connectivity optimized for storing information? Nature Neuroscience, 19(5), 749-755. https://doi.org/10.1038/nn.4286 which suggests that the experimentally observed over-representation of symmetric synapses suggests that cortical networks are optimized for attractors rather than sequences.

      We thank the reviewer for this suggestion. We will add a paragraph in discussion that discusses work on statistics of synaptic connectivity in optimal networks. We expect that in networks that contain two subpopulations of neurons, the degree of symmetry should be intermediate between a network storing fixed point attractors exclusively, and a network storing sequences exclusively. We will also elaborate on predictions our scenario makes on higher order network motifs.

      Despite the above weaknesses, the work is a solid advance in proposing an alternate model for modulating speed of sequence retrieval and extends the use of well-established theoretical tools. This work is expected to spawn further works like extending to a spiking neural network with Dale's law, more realistic learning taking into account recurrent connections during learning, and experimental follow-ups. Thus, I expect this to be an important contribution to the field.

      We thank the reviewer for the insightful comments.

      Reviewer #2 (Public Review):

      Sequences of neural activity underlie most of our behavior. And as experience suggests we are (in most cases) able to flexibly change the speed for our learned behavior which essentially means that brains are able to change the speed at which the sequence is retrieved from the memory. The authors here propose a mechanism by which networks in the brain can learn a sequence of spike patterns and retrieve them at variable speed. At a conceptual level I think the authors have a very nice idea: use of symmetric and asymmetric learning rules to learn the sequences and then use different inputs to neurons with symmetric or asymmetric plasticity to control the retrieval speed. The authors have demonstrated the feasibility of the idea in a rather idealized network model. I think it is important that the idea is demonstrated in more biologically plausible settings (e.g. spiking neurons, a network with exc. and inh. neurons with ongoing activity).

      Summary

      In this manuscript authors have addressed the problem of learning and retrieval sequential activity in neuronal networks. In particular, they have focussed on the problem of how sequence retrieval speed can be controlled?

      They have considered a model with excitatory rate-based neurons. Authors show that when sequences are learned with both temporally symmetric and asymmetric Hebbian plasticity, by modulating the external inputs to the network the sequence retrieval speed can be modulated. With the two types of Hebbian plasticity in the network, sequence learning essentially means that the network has both feedforward and recurrent connections related to the sequence. By giving different amounts of input to the feed-forward and recurrent components of the sequence, authors are able to adjust the speed.

      Strengths

      • Authors solve the problem of sequence retrieval speed control by learning the sequence in both feedforward and recurrent connectivity within a network. It is a very interesting idea for two main reasons: 1. It does not rely on delays or short-term dynamics in neurons/synapses 2. It does not require that the animal is presented with the same sequences multiple times at different speeds. Different inputs to the feedforward and recurrent populations are sufficient to alter the speed. However, the work leaves several issues unaddressed as explained below.

      Weaknesses

      • The main weakness of the paper is that it is mostly driven by a motivation to find a computational solution to the problem of sequence retrieval speed. In most cases they have not provided any arguments about the biological plausibility of the solution they have proposed e.g.:

      • Is there any experimental evidence that some neurons in the network have symmetric Hebbian plasticity and some temporally asymmetric? In the references authors have cited some references to support this. But usually the switch between temporally symmetric and asymmetric rules is dependent on spike patterns used for pairing (e.g. bursts vs single spikes). In the context of this manuscript, it would mean that in the same pattern, some neurons burst and some don't and this is the same for all the patterns in the sequence. As far as I see here authors have assumed a binary pattern of activity which is the same for all neurons that participate in the pattern.

      There is currently only weak evidence for heterogeneity of synaptic plasticity rules within a single network, though there is plenty of evidence for such a heterogeneity across networks or across locations within a particular structure (see references in our Discussion). The reviewer suggests another interesting possibility, that the temporal asymmetry could depend on the firing pattern on the post-synaptic neuron. An example of such a behavior can be found in a paper by Wittenberg and Wang in 2006, where they show that pairing single spikes of pre and post-synaptic neurons lead to LTD at all time differences in a symmetric fashion, while pairing a pre-synaptic spike with a burst of post-synaptic spikes lead to temporally asymmetric plasticity, with a LTP window at short positive time differences. We will mention this possibility in the Discussion, but we believe exploring fully this scenario is beyond the scope of the paper.

      • How would external inputs know that they are impinging on a symmetric or asymmetric neuron? Authors have proposed a mechanism to learn these inputs. But that makes the sequence learning problem a two stage problem -- first an animal has to learn the sequence and then it has to learn to modulate the speed of retrieval. It should be possible to find experimental evidence to support this?

      Our model does not assume that the two processes necessarily occur one after the other. Importantly, once the correct external inputs that can modulate sequence retrieval are learned, sequence retrieval modulation will automatically generalize to arbitrary new sequences that are learned by the network.

      • Authors have only considered homogeneous DC input for sequence retrieval. This kind of input is highly unnatural. It would be more plausible if the authors considered fluctuating input which is different from each neuron.

      In a revised manuscript, we will add an additional panel to Figure 1 to demonstrate that fluctuating inputs do not qualitatively affect our results.

      • All the work is demonstrated using a firing rate based model of only excitatory neurons. I think it is important that some of the key results are demonstrated in a network of both excitatory and inhibitory spiking neurons. As the authors very well know it is not always trivial to extend rate-based models to spiking neurons.

      I think at a conceptual level authors have a very nice idea but it needs to be demonstrated in a more biologically plausible setting (and by that I do not mean biophysical neurons etc.).

      We are confident that our results can be reproduced in networks of excitatory and inhibitory spiking networks, since previous studies have shown that such networks can exhibit attractor dynamics (e.g. Amit and Brunel 1997, Brunel and Wang 2001) and sequential activity (e.g. Gillett, Pereira, and Brunel 2020). We plan to include a new section with an associated figure to a revised manuscript demonstrating how the flexible speed control can be achieved in an excitatory-inhibitory (E-I) spiking network containing two excitatory populations with distinct plasticity mechanisms.

    1. Author Response

      We thank the Editor and the Reviewers for the kind words, the helpful suggestions, and the points of critique, which have all helped us substantially strengthen the manuscript. We have made the aesthetic changes requested by Reviewer 2.

      Response to Reviewer 2

      We thank the Reviewer for their thorough feedback. We provide point by point responses below.

      Concern 1

      In paragraph 4.2, I found it unclear why the authors find it unsurprising that different experiments would correspond to different betas. I think that this point should be discussed, as beta and N appear in combination in determining the interaction strength. Otherwise, they could try to fit all distributions with the same beta, which would be more natural for me. I guess that the fits would be anyway good to the eye, though quantitatively suboptimal (which could be quantified with the distance introduced).

      The reviewer raises valid concerns since as shown in Fig 3, the chosen values for beta, the additional fitting parameter introduced in the agent-based simulation, are: β = 0.18, 0.13, 0.12 and 0.64 respectively for N = 5, 10, 15, 20. We (RS, OM, and OP) find it intriguing that the optimum beta clusters around similar values for N = 5, 10, 15, while the optimum beta for N = 20 is significantly different. We acknowledge that we do not have an explanation why the fitted parameters values are what they are but note that the fitting curve is flat, implying that several beta values could possibly achieve a satisfactory fit. While further agent-based simulations could explore these findings more systematically, we believe that investigating this matter is outside the scope of this paper. Instead, we have acknowledged these points explicitly in the revised discussions.

      Portion added to discussions: “As shown in Fig. 3, the chosen values for beta, the additional fitting parameter introduced in the agent-based simulation, are: β = 0.18, 0.13, 0.12 and 0.64 respectively for N = 5, 10, 15, 20. Perhaps it is intriguing that the optimum beta clusters around similar values for N = 5, 10, 15, while the optimum beta for N = 20 is significantly different. While we do not currently have an explanation for why the fitted parameter values are what they are, we note that the fitting curve is flat, implying that several beta values could possibly achieve a satisfactory fit. Further agent-based simulations could explore these findings more systematically, and provide useful insights.”

      Concern 2

      Citation of previous work on dynamical quorum sensing (lines 51 & 52) I think misses two important points: first these works (and others following them) deal with the appearance of collective oscillations at high density (therefore, the same general problem addressed here); second, Taylor et al. studied also a transition where the oscillators involved did not oscillate at low density, whereas above a density threshold, they display coherent collective oscillations whose period decreases with density - similar to what observed here. I do not think this takes anything away from the originality of this work, which refers to a different system, and models it with different equations, but the parallelism between integrate-and-fire dynamics with quenched noise and excitable dynamics in the presence of noise should in my opinion not be overlooked.

      We have explicitly mentioned this in the revised text.

      Concern 3

      As the authors stress in lines 105 and 132, the analytical model shows that all that really matters in this phenomenon is the fastest frequency of the system. This could be used as an argument to say that the actual frequency distribution of individual fireflies is not all that important, as long as their fastest frequency is comparable. The assumption that they are identical would then sound less radical. Ideally, one could use the numerical simulations to check this, as well as the fact that the phenomenon does not break down when the shortest individual interburst interval Tbmin is narrowly distributed (which could also explain why having a few individuals who can flash at a higher frequency does not affect the outcome).

      We thank the reviewer for these observations.

      Concern 4

      I still feel that the agreement between the model and observations is a bit overstated (line 120). At least, I think the authors may stress that whereas the model predicts that the frequency of the 7-14 minutes oscillations should increase a lot with N, this is not observed in the data. Maybe this mismatch would be reduced if inter-individual variability was added.

      Please see the last three paragraphs of the discussion section. In reality, as the swarm size increases, we expect that swarms will no longer be all-to-all connected, and the dynamics of the system will depend upon the speed of propagation of information across the swarm. Precisely how this happens is outside of the scope of the current experimental work and theoretical description presented here.

    1. Author Response

      Thank you for taking the time to manage the reviews of this manuscript. Many helpful suggestions were presented by the two reviewers that will certainly strengthen the revised version of the manuscript. We would like to take the opportunity to provide a provisional response to address concerns and factual errors in the eLife assessment and public reviews. Please see below.

      Response to eLife assessment:

      The assessment does not appear to reflect reviews entirely accurately. While reviewer 1 was unsatisfied by the “lack of thorough analysis of the experimental outcomes”, the criticism of a lack of sufficient support of our claims was not present in the reviews. Thus, the sentence, “The evidence supporting the claims are interesting although incomplete in some areas”, seems to us excessively negative. Furthermore, while we agree that this work inspires new studies to determine how UBC circuits function in the intact brain and how they promote behaviors, and that “substantial work remains to be conducted” to explore these new avenues, the way the sentence is constructed, and placed directly after “incomplete in some areas”, makes it read as a negative related to the current manuscript, whereas opening doors to new lines of research is certainly positive for the field.

      Response to Reviewer 1:

      • One of the main criticisms appears to be a lack of quantification of our electrophysiological data and clear explanation of how the model reproduces the behavior of the cells reported here and in previous work. We are thankful for the identification of these omissions. Our extensive work in UBC electrophysiology instructed the development of these models and they reproduce the essential features of ON and OFF UBC spiking responses and mGluR2 and AMPAR conductances accurately, although we agree that we did not present sufficient evidence for this in the manuscript.

      • Another major criticism was a lack of consideration of feedback and feedforward inhibition. The goal of Figure 1 was to determine the cell types of labeled UBCs in transgenic mouse lines, which is determined entirely by their synaptic responses to glutamate (Borges-Merjane & Trussell, 2015). Thus, blocking inhibition was essential to produce clear results. Feedback and feedforward inhibition from Golgi cells, which is certainly important in the intact circuit, is not possible to produce in a physiologically realistic way in acute brain slices, because electrical stimulation produces synchronous excitation and inhibition (by directly exciting Golgi cells, rather than their synaptic inputs). The main inhibition that UBCs receive is through mGluR2, which lasts for 100-1000s of milliseconds, and the main excitation that UBCs receive is through mGluR1 and AMPA, which also both last 100-1000s of milliseconds. Thus, these large conductances are unlikely to be significantly shaped by 1-10 ms IPSCs from feedforward and feedback inhibition. For these reasons, it was not our intention to explore GABAergic/glycinergic feedforward and feedback inhibition in the present study.

      Factual errors in public reviews:

      Reviewer 1, specific point 4:

      A) The reviewer accurately points out that the model did not incorporate a change in the amount of glutamate released across release events during trains of presynaptic spikes. We did not find this to be necessary to reproduce the AMPA and mGluR2 currents accurately, because the majority of the response occurs after the last presynaptic stimulus. Short term plasticity during the stimulus train would be expected to change the total amount of glutamate released, but not the time course of the slow current response. We previously showed that the predominant synaptic plasticity that occurs at this synapse during the train is short-term depression that is due in large part to postsynaptic desensitization of AMPA receptors, rather than a change in presynaptic release.

      B) The reviewer states that the model does not include desensitization of AMPA receptors. Although there is not a variable that defines desensitization explicitly, the detailed kinetic AMPA receptor model used here accounts for desensitization, which, in fact, mediates slow ON UBC current and is the focus of our previous work. This AMPA receptor model (developed in Balmer et al., 2021 using UBC data from Lu et al., 2017) is a 13-state model, including 4 open states with 1-4 glutamates bound, 4 closed states with 1-4 glutamates bound, 4 desensitized states with 1-4 glutamates bound, and 5 closed states with 0-4 glutamates bound. The transition rates between different states in the model were fit to AMPA receptor currents recorded from dissociated UBCs and they approximate well the ON UBC currents evoked by synaptic stimulation (Balmer et al., 2021).

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript is very-well written. Although the study is well-conducted the authors should be more convincing on how bacteria residing in tissues do not induce death. The association with IL-10 cytokine production appears weak and more experiments are needed to make it more robust.

      Thank you very much for your thoughtful and constructive feedback on our manuscript. We appreciate your positive assessment of the writing quality and the acknowledgment of the wellconducted nature of the study.

      In regard to the reviewer's comment that "The association with IL-10 cytokine production appears weak," we would like to provide a comprehensive response based on the findings and insights presented in our study (Fig 5). We would like to emphasize several key points to further elucidate this association:

      The established knowledge underscores IL-10's capacity to hinder the activation and proliferation of macrophages, thereby safeguarding against an overly aggressive immune-inflammatory reaction (as referenced). In our earlier investigations, we demonstrated that NAD+ orchestrates a systemic generation of IL-10, which assumes a pivotal function in curtailing proinflammatory responses across various conditions, such as autoimmune diseases (as referenced), alloimmunity (as referenced), and bacterial infections (as referenced). In our latest research, we divulge that the introduction of NAD+ leads to an elevated occurrence of IL-10-producing CD4+ T cells, CD8+ T cells, and macrophages, although not dendritic cells (depicted in Figure 5B and C). Furthermore, our comprehensive analyses have substantiated that NAD+ administration thwarts pyroptosis by specifically targeting the non-canonical inflammasome pathway. Intriguingly, our in vitro outcomes suggest that the neutralization of the autocrine IL-10 signaling pathway through a neutralizing antibody and an IL-10 receptor antagonist partially reverses the NAD+-mediated blockage of pyroptosis. These in vitro results imply that NAD+ induces the production of IL-10 cytokines by macrophages, contributing to the suppression of pyroptosis. To corroborate our in vitro conclusions, we employed IL-10 knockout mice and wild-type mice, both treated with either NAD+ or a placebo solution. The wild-type mice treated with NAD+ displayed a survival rate exceeding 80%, whereas the IL-10 knockout mice exhibited a survival rate of "only" 40%. These in vivo findings align with our in vitro discoveries, underscoring the crucial role of NAD+mediated IL-10 cytokine production in impeding pyroptosis through NAD+ and shielding against septic shock. Drawing from our prior and current investigations, we respectfully disagree with the reviewer's characterization of our work as "weak."

      Reviewer #2 (Public Review):<br /> Iske et al. provide experimental data that NAD+ lessens disease severity in bacterial sepsis without impacting on the host pathogen load. They show that in macrophages, NAD+ prevents Il1b secretion potentially mediated by Caspase11.

      Thank you for taking the time to review our manuscript. We appreciate your insightful comments and valuable feedback regarding our study on the role protective role and underlying mechanisms of NAD+ in septic shock.

      While the in vivo and in vitro data is interesting and hints towards a crucial role of NAD+ to promote metabolic adaptation in sepsis, the manuscript has shortcomings and would profit from several changes and additional experiments that support the claims.

      We would like to point out that our current study does not underscore a metabolic adaptation in sepsis but more an immune regulation and a specific blockade of the non-canonical inflammasome signaling machinery.

      Conceptually, the definition of sepsis is outdated. Sepsis is not SIRS, as in sepsis-2. Sepsis-3 defines sepsis as infection-associated organ dysfunction. This concept needs to be taken into account for the introduction and when describing the potential effects of NAD+ in sepsis. Also, LPS application cannot be considered a sepsis model, since it only recapitulates the consequence of TLR-4 activation. It is a model of endotoxemia. Also, the LPS data does not allow to draw conclusions about bacterial clearance (L135).

      Our study uses highly lethal doses of E. Coli or LPS. These doses have been shown to result in multiple organ failure (1, 2). For many decades until now an un-numerable number of studies have used LPS as a model of sepsis (3, 4, 5). We have used LPS animal model based on a study published in 2013 by Kayagaki et al. (1), where the authors reported a novel TLR4-independent mechanism but mediated via activate caspase-11. We used the same animal model to demonstrate the specific role of NAD+ in targeting this TLR4-independent mechanism but mediated via activate caspase-11 and underscore NAD+’s mode of protection.

      Moreover, we have not only used LPS but bacterial infection as well using E. Coli. We have also previously published an additional research article demonstrating the protective effect against Listeria Monocytogenes (6). The only model we currently did not use in our current study, is a cecal ligation puncture (CLP) model which is also another common animal model for sepsis.

      Our conclusions regarding bacterial clearance are based not only on LPS results but also based on the bacterial load measurement and survival (Figure 1B&C) following E. Coli administration in different tissues (kidney and liver) and not LPS.

      The authors state that protective effects by NAD were independent of the host pathogen load. This clearly indicates that NAD confers protection via enhancing a disease tolerance mechanism, potentially via reducing immunopathology. This aspect is not considered by the authors. The authors should incorporate the concept of disease tolerance in their work, cite the relevant literature on the topic and discuss it their findings in light of the published evidence for metabolic alteration sand adaptations in sepsis.

      We respectfully disagree with the reviewer’s comment and do not believe that NAD+ enhances disease tolerance. We have supporting data indicating that NAD+ mediates protection via a specific blockade of the non-canonical inflammasome pathway, which prevents an over-zealous immune response that results in organ damage and multiple organ failure (MOF). Moreover, we demonstrate that not only NAD+ mediates protection via a specific blockade of the non-canonical inflammasome pathway but prevents septic shock induced death by an additional immunosuppression mediated by the systemic production of IL-10.

      Both Caspase-11 and IL-10 pathways are crucial in NAD+ mediated protection against lethal doses of E. Coli and LPS administration. Figure 5A indicates that caspase-11-/- mice treated with PBS have a modest survival rate (~40% survival) when compared to the group of mice treated with NAD+ (>80% survival). These data indicate that NAD+ promotes survival via a caspase-11independent mechanism. Similarly, wild type mice subjected to NAD+ administration exhibited >80% survival, while NAD+ administration to IL-10-/- mice resulted only in a 40% survival rate. Based on these findings, we believe that NAD+ mediated protection against septic shock via a blockade of caspase-11 blockade and by IL-10 cytokine production that dampened the overzealous immune response rather than a disease tolerance.

      For the in vitro data, the manuscript would benefit from additional experiments using in vitro infection models.

      In the current study we have used two in vivo models using LPS and E. Coli a gram-negative bacterium. We have also previously reported the protective role of NAD+ in the context of Listeria Monocytogenes (6) a gram-positive bacterium. In the current study, our aim was to demonstrate the inhibitory role of NAD+ on the non-canonical pathway specifically. We believe that additional in vitro experiments for this study are out of scope.

      In the merge manuscript, the authors provide two different versions of the figures. In one, bar plots are shown without individual data and in the other with scatter blots. All bar plots need to be provided as scatter plots showing individual values.

      As requested by reviewer #2 all bar plots are now provided as scatter plots showing individual values.

      The authors should show further serology data for kidney and liver failure etc. as well as further cytokine data such as IL-6 and TNF to better characterize their models.

      We did not perform further serology analysis, but we did measure IL-6 and TNFα in mice treated with NAD+ or PBS. Mice treated with NAD+ had a reduced systemic level of both cytokines IL-6 and TNFα. We have now added the figures (Figure 1F). In addition, we performed a long-term survival, and all mice treated with NAD+ recovered fully after 10 days and survived over a year after infection. In addition, the mice that survived following NAD+ treatment died of old age.

      Careful revision of the entire manuscript, the figure legends and figures is required. The figure legend should not repeat the methods and materials section. The nomenclature for mouse protein and genes needs to be thoroughly revised.

      A Careful revision of the entire manuscript has been performed.

      L350. The authors write that they dissect the capacity of NAD+ to dampen auto- and alloimmunity. In this work, no data that supports this statement is shown and experiments with autoantigens or alloantigens are not performed.

      We thank the reviewer for this comment. We have now re-phrased our last sentence in the discussion and included references for our previous work. We have now stated:” We have previously reported that NAD+ administration can block auto- (7) and allo-immunity (8) via IL10 cytokine production. Here, we unveiled the capacity of NAD+ to protect against sepsisinduced death via a specific blockade of the non-canonical inflammasome pathway and a robust immunosuppression mediated by IL-10 cytokine production.

      L163 The authors describe pyroptosis but in the figure legend call it apoptosis. Specific markers for each cell death should be measured and determined which cell death mechanisms is involved.

      We thank the reviewer for this comment. We have focuses on pyoptosis-mediated cell death and not apoptosis. We have now replaced the term “apoptosis” by “pyroptosis-mediated to cell death”.

      Animal data comes from an infection model and LPS application. The RNAseq data is obtained from cells primed with Pam3CSK4 and subsequently subjected to LPS. It is unclear how the cell culture model reflects the animal model. As such the link between IFN signaling and the bacterial infection/LPS model are not convincing and need to be further elaborated.

      Our findings, depicted in Figure 3, pertain exclusively to in vitro investigations rather than in vivo examinations. Our research has demonstrated the selective inhibition of the non-canonical inflammasome pathway by NAD+, with a primary focus on unraveling the specific signaling pathway influenced by NAD+. Our in vitro outcomes indicate that the introduction of recombinant IFN-β counteracted the inhibitory effect of NAD+ on the non-canonical pathway. However, it's important to note that we have not evaluated the IFN-β pathway within our E. Coli and LPS in vivo models. Our primary intention was to exclusively decipher the roles of IFN-β and NAD+ in the context of inhibiting the non-canonical inflammasome, without extending our investigation to the broader in vivo scenarios.

      Figure 5: It is unclear how many independent survival experiments were done, how many mice per group were used and whether the difference between groups was statistical significant. This information should be added.

      We have now included the number of experiments, p values and number of animals used in Figure 5.

      Further experiments with primary cells from Il10 k.o. and Caspase11 k.o. animals should be provided that support the findings in macrophages.”

      We concur with the reviewer's suggestion regarding the need for further experiments involving primary cells from IL-10-/- and Caspase-11-/- mice. However, we are uncertain about the potential contribution of these experiments in generating novel or supplementary findings to the existing study.

    1. Author Response

      Reviewer #1 (Public Review):

      The goal of this study is to understand the allosteric mechanism of overall activity regulation in an anaerobic ribonucleotide reductase (RNR) that contains an ATP-cone domain. Through cryo-EM structural analysis of various nucleotide-bound states of the RNR, the mechanism of dATP inhibition is found to involve order-disorder transitions in the active site. These effects appear to prevent substrate binding and a radical transfer needed to initiate the reaction.

      Strengths of the manuscript include the comprehensive nature of the work - including numerous structures of different forms of the RNR and detailed characterization of enzyme activity to establish the parameters of dATP inhibition. The manuscript could be improved, however, by performing additional experiments to establish that the mechanism of inhibition can be observed in other contexts and it is not an artifact of the structural approach. Additionally, some of the presentations of biochemical data could be improved to comply with standard best practices.

      The work is impactful because it reports initial observations about a potentially new mode of allosteric inhibition in this enzyme class. It also sets the stage for future work to understand the molecular basis for this phenomenon in more detail.

      We thank the editor and reviewers for their positive evaluation of the potential impact of our work. We completely agree that hypotheses based on structural data require orthogonal experimental verification. However, the number and consistency of the cryo-EM structures speak in favour of the data being representative of conditions in solution. We feel that in particular cryo-EM data should be relatively free of artefacts, e.g. biased or incorrect relative domain orientations or artificially reduced mobility, compared to crystallography, where crystal packing effects can affect these parameters. As we write in response to Reviewer #2, it has been difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and only partly ordered in the dATP-bound tetramers. Further verification experiments will be performed in future but are outside the scope of the present article.

      We will improve the presentation of the biochemical data in a revised version.

      General comments:

      1) It would be ideal to perform an additional experiment of some type to confirm the order-disorder phenomena observed in the cryo-EM structures to rule out the possibility that it is an artifact of the structure determination approach. Circular dichroism might be a possibility?

      Circular dichroism reports only on the approximate relative proportions of helix, sheet and loop structure in a protein; thus we believe that it would not be a sensitive enough tool to distinguish between ordered and disordered states of the GRD. We are considering what alternative methods might be appropriate.

      2) Does the disordering phenomenon of one subunit in the ATP-bound structures have any significance - could it be related to half-of-sites activity? Does this RNR exhibit half-of-sites activity?

      Half-of-sites activity has not been biochemically proven in any ribonucleotide reductase although it was first suggested in 1987 (PMID: 3298261). However, a strong structural indication was recently published in the form of the holo-complex of the class Ia ribonucleotide reductase from Escherichia coli, which is highly asymmetrical and in which productive contacts forming an intact proton-coupled electron transfer pathway are only formed between one of two pairs of monomers (PMID: 32217749). We have not been able to prove half-of-sites activity for PcNrdD due to low overall radical content, but the structural results are indeed consistent with such an activity.

      3) Does the disordering of the GRD with dATP bound have any long-term impact on the stability of the Gly radical? I realize that the authors tested the ability to form the Gly radical in the presence of dATP in Fig. 4 of the manuscript. But it looks like they only analyzed the samples after 20 min of incubation. Were longer time points analyzed?

      Radical content was measured after 5 min and 20 min incubation; 5 min incubations (not included in the manuscript) consistently gave higher radical content compared to 20 min incubation. Longer time points were not analysed, as we assumed that the radical content would be even lower after 20 min.

      4) Did the authors establish whether the effect of dATP inhibition on substrate binding is reversible? If dATP is removed, can substrates rebind?

      This is an interesting question. We measured KDs for dATP in the micromolar range and are hence confident that dATP binding is reversible. Our measurements do not, however, directly prove that inhibition of the enzyme is reversible. Nevertheless, it is worth noting that the protein as purified contained significant amounts of dATP and purification conditions had to be optimised to remove dATP. This is evidence that PcNrdD that has “seen” dATP can subsequently bind substrate in the presence of ATP. We will describe the purification more clearly in a revision.

      5) In some figures (Fig. 6e, for example), the cryo-EM density map for the nucleotide component of the model is not continuous over the entire molecule. Can the authors comment on the significance of this phenomenon? Were the ligands validated in any way to ensure that the assignments were made correctly?

      Indeed, we sometimes saw discontinuous density for the nucleotides, both in the active site and in the specificity site. However, the break was almost always near the C5’ carbon atom, which is common to all nucleotides. While we cannot readily explain this phenomenon, the nucleotides refined well with full occupancy, giving B-factors similar to those of the surrounding protein atoms. The identity of the nucleotide could always be inferred from a) the size of the base (purine or pyrimidine); b) the known nucleotide combinations added to the protein before grid preparation; c) prior knowledge on the combinations of effector and substrate that have been found valid for all RNRs since the first studies of allosteric specificity regulation.

      Reviewer #2 (Public Review):

      This manuscript describes the functional and structural characterization of an anaerobic (Class III) ribonucleotide reductase (RNR) with an ATP cone domain from Prevotella copri (PcNrdD). Most significantly, the cryo-EM structural characterization revealed the presence of a flap domain that connects the ATP cone domain and the active site and provides structural insights about how nucleotides and deoxynucleotides bind to this enzyme. The authors also demonstrated the catalytic functions and the oligomeric states. However, many of the biochemical characterizations are incomplete, and it is difficult to make mechanistic conclusions from the reported structures. The reported nucleotide-binding constants may not be accurate because of the design of the assays, which complicates the interpretation of the effects of ATP and dATP on PcNrdD oligomeric states. Importantly, statistical information was missing in most of the biochemical data. Also, while the authors concluded that the dATP binding makes the GRD flexible based on the absence of cryo-EM density for GRD in the dATP-bound PcNrdD, no other supports were provided. There was also a concern about the relevance of the proposed GRD flexibility and the stability of Gly radical. Overall, the manuscript provides structural insights about Class III RNR with ATP cone domain and how it binds ATP and dATP allosteric effectors. However, ambiguity remains about the molecular mechanism by which the dATP binding to the ATP cone domain inhibits the Class III RNR activity.

      Strengths:

      1) The manuscript reports the first near-atomic resolution of the structures of Class III RNR with ATP domain in complex with ATP and dATP. These structures revealed the NxN flap domain proposed to form an interaction network between the substrate, the linker to the ATP cone domain, the GRD, and loop 2 important for substrate specificity. The structures also provided insights into how ATP and dATP bind to the ATP cone domain of Class III RNR. Also, the structures suggested that the ATP cone domain is directly involved in the tetramer formation by forming an interaction with the core domain in the presence of dATP. These observations serve as an important basis for future study on the mechanism of Allosteric regulation of Class III RNR.

      2) The authors used a wide range of methodologies including activity assays, nucleotide binding assays, oligomeric state determination, and cryo-EM structural characterization, which were impressive and necessary to understand the complex allosteric regulation of RNR.

      3) The activity assays demonstrated the catalytic function of PcNrdD and its ability to be activated by ATP and low-concentration dATP and inhibited by high-concentration dATP.

      4) ITC and MST were used to show the ability of PcNrdD to bind NTP and dATP.

      5) GEMMA was used successfully to determine the oligomeric state of PcNrdD, which suggested that PcNrdD exists in dimeric and tetrameric forms, whose ratio is affected by ATP and/or dATP.

      Weaknesses:

      1) Activity assays.

      The activity assays were performed under conditions that may not represent the nucleotide reduction activity. The authors initiated the Gly radical formation and nucleotide reduction simultaneously. The authors also showed that the amount of Gly radical formation was different in the presence of ATP vs dATP. Therefore, it is possible that the observed Vmax is affected by the amount of Gly radical. In fact, some of the data fit poorly into the kinetic model. Also, the number of biological and technical replicates was not described, and no statistical information was provided for the curve fitting.

      The highest turnover activity of PcNrdD measured in presence of ATP was 1.3 s-1 (470 nmol/min/mg), a kcat comparable to recently reported values for anaerobic and aerobic RNRs from Neisseria bacilliformis, Leeuwenhoekiella blandensis, Facklamia ignava, Thermus virus P74-23, and Aquifex aeolicus (PMID: 25157154, PMID: 29388911, PMID: 30166338, PMID: 34314684, PMID: 34941255). The general trend illustrated in Figure 1 is that ATP has an activating effect, whereas high concentrations of dATP have an inactivating effect, which cannot be explained by suboptimal assay conditions since our EPR results consistently show that more radical is formed in incubations with dATP compared to incubations with ATP. Curve fitting methods used are listed in Materials and Methods (as specified in the Figure 1 legend), and standard errors for all specified curve fitting results (from triplicate experiments) are shown in Figure 1.

      2) Binding assays.

      The interpretation of the binding assays is complicated by the fact that dATP binds both a- and s-sites and ATP binds a- and active sites. dATP may also bind the active site as the product. It is unknown if ATP binds s-site in PcNrdD. Despite this complexity, the binding assays were performed under the condition that all the binding sites were available. Therefore, it is not clear which event these assays are reporting.

      Both ITC and MST experiments involving ATP and dATP binding to the a-site were performed in the presence of at least 1 mM GTP substrate (5 mM in MST) to fill the active site, and 1 mM dTTP effector to fill the s-site (specified in the legend to Figure 2). These conditions enable binding of ATP or dATP only to the a-site in the ATP-cone.

      3) Oligomeric states.

      Due to the ambiguity in the kinetic parameters and the binding constants determined above, the effects of ATP and dATP on the oligomeric states are difficult to interpret. The concentrations of ATP used in these experiments (50 and 100 uM) were significantly lower than KL determined by the activity assays (780 uM), while it is close to the Kd values determined by ITC or MST (~25 uM). Since it is unclear what binding events ITC and MST are reporting, the data in Figure 3 does not provide support for the claimed effects of ATP binding. For the effects of dATP, the authors did not observe a significant difference in oligomeric states between 50 or 100 uM dATP alone vs 50 uM dATP and 100 uM CTP. The former condition has dATP ~ 2x higher than the Kd and KL (Figure 1b) and therefore could be considered as "inhibited". On the other hand, NrdD should be fully active under the latter condition. Therefore, these observations show no correlation between the oligomeric state and the catalytic activity.

      The results in Figure 3 show that at in presence of 100 µM ATP plus 100 µM CTP the oligomeric equilibrium is 64% dimers plus 36% tetramers, and in presence of 50-100 µM dATP the oligomeric equilibrium is 32% dimers and 68% tetramers. We agree that there is no clear and strong correlation between oligomeric state and inhibition. We will also try to make it clearer in a revised version. Meanwhile, to add some further clarity, SEC experiments at higher nucleotide concentrations will be included in the revision.

      4) Effects of dATP binding on GRD structure

      One of the key conclusions of this manuscript is that dATP binding induces the dissociation of GRD from the active site. However, the structures did not provide an explanation for how the dATP binding affects the conformation of GRD or whether the dissociation of GRD is a direct consequence of dATP binding or it is due to the absence of nucleotide substrate. Also, Gly radical is unlikely to be stable when it is not protected from the bulk solvent. Therefore, it is unlikely that the GRD dissociates from the active site unless the inhibition by dATP is irreversible. Further evidence is needed to support the proposed mechanism of inhibition by dATP.

      We admit that it has been difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and that the linker can only be partly modelled in the dATP-bound tetramers. Most likely dATP binding causes a change in the dynamics of the linker region and NxN flap that directly affects substrate binding and simultaneously causes disorder of the GRD, given that all are part of a connected system (described as “nexus” in the manuscript). The structures determined in the presence of dATP and CTP show that CTP cannot bind in the absence of an ordered NxN flap.

      In any case a major conclusion of the work is that dATP does not inhibit the anaerobic RNR by prevention of glycyl radical formation but by prevention of its subsequent transfer. We agree that further evidence is required to support the proposed mechanism but, given the extent of the data already presented in the manuscript, we feel that such studies should be the subject of a future publication.

      5) Functional support for the observed structures.

      Evidence for connecting structural observations and mechanistic conclusions is largely missing. For example, the authors proposed that the interactions between the ATP cone domain and the core domain are responsible for tetramer formation. However, no biochemical evidence was provided to support this proposal. Similarly, the functional significance of the interaction through the NxN flap domain was not proved by mutagenesis experiments.

      We did actually make mutants to verify the observed interactions in the tetramer, but several of them did not behave well in our hands, e.g. with regard to protein stability. Since we have no evidence that oligomerisation is coupled to inhibition, and since we did not observe any conservation between protein sequences in the interaction area, we chose not to pursue this point further. The main merit of the tetramer structures is that they allowed a high-resolution view of dATP binding to the ATP-cone and a comparison to previously observed ATP-cones. Nevertheless, mutation experiments, also including the NxN flap, could be the subject of future work.

      Reviewer #3 (Public Review):

      The manuscript by Bimai et al describes a structural and functional characterization of an anaerobic ribonucleotide reductase (RNR) enzyme from the human microbe, P. copri. More specifically, the authors aimed to characterize the mechanism by how (d)ATP modulates nucleotide reduction in this anaerobic RNR, using a combination of enzyme kinetics, binding thermodynamics, and cryo-EM structural determination. One of the principal findings of this paper is the ordering of a NxN 'flap' in the presence of ATP that promotes RNR catalysis and the disordering of both this flap and the glycyl radical domain (GRD) when the inhibitory effector, dATP, binds. The latter is correlated with a loss of substrate binding, which is the likely mechanism for dATP inhibition. It is important to note that the GRD is remote (>30 Ang) from the binding site of the dATP molecule, suggesting long-range communication of the structural (dis)ordering. The authors also present evidence for a shift in oligomerization in the presence of dATP. The work does provide evidence for new insights/views into the subtle differences of nucleotide modulation (allostery) of RNR through long-range interactions.

      The strengths of the work are the impressive, in-depth structural analysis of the various regulated forms of PcRNR by (d)ATP using cryo-EM. The authors present seven different models in total, with striking differences in oligomerization and (dis)ordering of select structural features, including the GRD that is integral to catalysis. The authors present several, complementary biochemical experiments (ITC, MST, EPR, kinetics) aimed at resolving the binding and regulatory mechanism of the enzyme by various nucleotides. The authors present a good breadth of the literature in which the focus of allosteric regulation of RNRs has been on the aerobic orthologues.

      Given the resolution of some of the structures in the remote regions that appear to be of importance, the rigor of the work could have been improved by complementing this experimental studies with molecular dynamics (MD) simulations to reveal the dynamics of the GRD and loops/flaps at the active site.

      We will discuss this option with expert colleagues.

      The biochemical data supporting the loss of substrate binding with dATP association is compelling, but the binding studies of the (d)ATP regulatory molecules are not; the authors noted less-than-unity binding stoichiometries for the effectors.

      Most of the methods used measure only binding strength, not the number of binding sites (N), whereas ITC also measures number of sites. N is dependent on the integrity of the protein, i.e. the number of protein molecules in a preparation that are involved in binding, and quite often gives lower values than the theoretical number of binding sites.

      Also, the work would benefit from additional support for oligomerization changes using an additional biochemical/biophysical approach.

      SEC (chromatography), GEMMA (mass spectrometry) and cryo-EM were used to study oligomerization. Since each method has restrictions on nucleotide concentrations as well as protein concentrations that can be used, the results are not directly comparable, but all three methods indicate nucleotide dependent oligomerization changes. The SEC results will be included in a revised version.

      Overall, the authors have mostly achieved their overall aims of the manuscript. With focused modifications, including additional control experiments, the manuscript should be a welcomed addition to the RNR field.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Review 1

      Public Review

      The authors set out to develop an organoid model of the junction between early telencephalic and ocular tissues to model RGC development and pathfinding in a human model. The authors have succeeded in developing a robust model of optic stalk(OS) and optic disc(OD) tissue with innervating retinal ganglion cells. The OS and OD have a robust pattern with distinct developmental and functional borders that allow for a distinct pathway for pathfinding RGC neurites.

      This study falls short on a thorough analysis of their single cell transcriptomics (scRNAseq). From the scRNAseq it is unclear the quality and quantity of the targeted cell types that exist in the model. A comparative analysis of the scRNAseq profiles of their cell-types with existing organoid protocols, to determine a technical improvement, or with fetal tissue, to determine fidelity to target cells, would greatly improve the description of this model and determine its utility. This is especially necessary for the RGCs developed in this protocol as they recommend this as an improved model to study RGCs.

      Future work targeting RGC neurite outgrowth mechanisms will be exciting.

      We are grateful to Reviewer 1 for these constructive comments. We added plots for quality control in supp. Fig. S5 and quantification of cell clusters in Tab. 1. We compared the transcriptomes between CONCEPT organoids, Gabriel et al.’s brain/optic organoids (Gabriel et al., 2021; PMID: 34407456), and human fetal retinas HGW9 (Lu et al., 2020; PMID: 32386599), which strongly support our findings (Figs. 5, 6; see responses below for details). Besides FGFs/FGFR signaling, scRNA-seq identified additional candidate molecules that may provide axon guidance functions, and these candidate molecules are the focus of our future study.

      Recommendations For The Authors

      This study falls short on a thorough analysis of their single cell transcriptomics (scRNAseq).

      The scRNAseq figure needs to be better presented to allow for an adequate assessment of the model. As written the classification of the different clusters is hard to follow. A representative labeling of the suspected identity of the clusters in an infographic would aid the figure. Since it is hard to follow it is difficult to determine how well clusters correlate with designated cell types. PAX2 expression designating optic stalk seems to correlate well with the group 2 and the designation of the Optic disk, however PAX2 expression for the optic stalk is half in group 4 and half in group 9. what are group 4 and 9? It is also not clear how the thresholding for the given clusters was reached.

      To present the scRNA-seq dataset in a clearer way, we added dotted red lines in Fig. 4C to delineate eye (mostly retinal), telencephalic, and mixed cell populations. In Tab. 1, we showed assigned cell types, counts, and percentage for each cluster.

      PAX2+ VSX2- optic stalk cells were at edges of clusters 4, 8, 9 that had dorsal telencephalic identities. Clusters 4, 8, 9 were largely segregated along cell cycle phases (Fig. 4A, B, F), and these clusters differentially expressed gene markers SOX3, FGFR2, PRRX1, EDNRB, and FOXG1 (supp. Fig. S7A-S7D; Fig. 4C). In E14.5 mouse embryos, mouse orthologs of SOX3, FGFR2, PRRX1, and EDNRB were specifically expressed in dorsal telencephalon (Fig. S8AS8E); Foxg1 was specifically expressed in both dorsal and ventral telencephalon. Therefore, clusters 4, 8, and 9 have dorsal telencephalic identities, and PAX2+ VSX2- optic stalk cells are at edges of these telencephalic clusters. Lines 259-261; 297-298.

      Thresholding of cell clusters were determined by cell clustering parameters, which is described in Materials and Methods: FindVariableFeatures (selection.method = "vst", nfeatures = 2000), ScaleData, RunPCA, ElbowPlot, FindNeighbors (dims = 1:17), FindClusters (resolution = 0.5), and RunUMAP(dims = 1:17). Lines 717-721.

      The authors should make an attempt to calculate which different cell types are present and in what proportions. They should also discuss groups that are confounding. Since this is the first description of this technique it is critical to know how much of the model represents mature welldefined cells of interest.

      We assigned cell types to clusters and calculated cell counts and proportions of each cluster (Tab.1). The only undetermined cell cluster was cluster 13, which was the smallest one. We described top DEGs of cluster 13 and discussed the cluster. Lines 266-268.

      Concerning the focus on RGC isolation. It is interesting that CNTN2 can be used for an effective isolation however, there are many protocols for generating RGCs. Is CNTN2 expression unique to this protocol? If the authors claim that this protocol could be used for studying glaucoma, how does this protocol improve on the quality of RGCs compared to other protocols?

      RGC-specific CNTN2 expression was not unique to CONCEPT organoids. We isolated RGCs via CNTN2 from both CONCEPT organoids and 3-D retinal organoids in suspension. Indeed, isolated RGCs shown in the manuscript were from 3-D retinal organoids (see Materials and Methods for details). Importantly, our single cell RNA sequencing analysis demonstrated that CNTN2 was also differentially expressed in early RGCs from human fetal retinas (Fig. 5L, 5M). Therefore, isolation of human early RGCs via CNTN2 should be applicable widely.

      In CONCEPT organoids, RGC differentiation and directional axon growth were very efficient. Our study supports a model that FGFs from optic disc cells efficiently induce RGC differentiation and directional axon growth in adjacent retinal progenitor cells, as FGFR inhibitions drastically decreased the number of RGC somas and directional axon growth (Fig. 9). Therefore, CONCEPT organoids are useful in studying axon guidance cues in humans, which knowledge is much needed for axon regrowth from RGCs that are damaged in glaucoma. Notably, juvenile glaucoma gene CYP1B1 was found in assigned optic disc cells in both CONCEPT organoids and human fetal retinas (Fig. 4I, 5D), making CONCEPT organoids a testable model in studying the functions of CYP1B1 in human cells.

      A comparative analysis of the scRNAseq profiles of their model with existing organoid protocols, to determine a technical improvement, or with fetal tissue, to determine fidelity to target cells, would greatly improve the description of this model and determine its utility.

      In the revised manuscript, we compared the transcriptomes between CONCEPT organoids, Gabriel et al.’s brain/optic organoids (Gabriel et al., 2021; PMID: 34407456), and human fetal retinas HGW9 (Lu et al., 2020; PMID: 32386599). Gabriel et al. (2021) report “axon-like” projections in their “optic vesicle-containing brain organoids”. We found that PAX2+ optic disc, PAX2+ optic stalk, FOXG1+ telencephalic, and VSX2+ neuroretinal cell clusters that were found in CONCEPT organoids did not exist in Gabriel et al.’s organoids (supp. Fig. S12), indicating striking differences between Gabriel et al.’s organoids and our CONCEPT telencephalon-eye organoids.

      On the other hand, CONCEPT organoids and human fetal retinas HGW9 had similar expression signatures (Fig. 5). First, we identified a PAX2+ cell cluster in the human retinas HGW9. 64/113 DEGs in the PAX2+ cluster from human fetal retinas HGW9 were also DEGs of cluster 2 (assigned PAX2+ optic disc cells) from CONCEPT organoids. Second, CNTN2 was also differentially expressed in early RGCs of human fetal retinas. Third, when cells in cluster 18 and retinal progenitor clusters from the HGW9 dataset were combined with cells in clusters 2, 4, 5, 7 from the CONCEPT dataset for Seurat anchor-based clustering, cells in cluster 18 from HGW9 (H18) were grouped with cluster 2 from CONCEPT organoids (C2, assigned optic disc; N), and these cells expressed both PAX2 and VSX2 (arrowheads in Fig. 5N-5R). A small portion of H18 cells were grouped with cluster 4 from CONCEPT organoids (C4, assigned optic stalk; N), and these cells expressed PAX2 but not VSX2 (arrows in Fig. 5N-5R). Fourth, CONCEPT organoids and human fetal retinas shared many enriched GO terms in DEGs of assigned optic disc cells (Fig. 6).

      Collectively, transcriptomic comparisons support that our CONCEPT organoids are innovative and similar to human fetal retinas. Lines 325-392.

      Not clear what reporting on Lens cells in Figure 3 adds to the focus of the manuscript. The figure seems out of place with the flow of the manuscript.

      Lens cells were obvious in CONCEPT organoids. The presence of lens cells indicates that cysts have the developmental potential for both neural and non-neural anterior ectodermal cells. For a better flow, we added a transitional sentence at the beginning of the lens section. Lines 207208.

      Reviewer #2

      Public Review

      The study by Liu et al. reports on the establishment and characterization of telencephalon eye structures that spontaneously form from human pluripotent stem cells. The reported structures are generated from embryonic cysts that self-form concentric zones (centroids) of telencephaliclike cells surrounded by ocular cell types. Interestingly, the cells in the outer zone of these concentric structures give rise to retinal ganglion cells (RGCs) based on the expression of several markers, and their neuronal morphology and electrophysiological activity. Single-cell analysis of these brain-eye centroids provides detailed transcriptomic information on the different cell types within them. The single-cell analysis led to the identification of a unique cellsurface marker (CNTN2) for the human ganglion cells. Use of this marker allowed the team to isolate the stem cell-derived RGCs.

      Overall, the manuscript describes a method for generating self-forming structures of brain-eye lineages that mimic some of the early patterning events, possibly including the guidance cues that direct axonal growth of the RGCs. There are previous reports on brain-eye organoids with optic nerve-like connectivity; thus, the novel aspect of this study is the self-formation capacity of the centroids, including neurons with some RGC features. Notably, the manuscript further reports on cell-surface markers and an approach to generating and isolating human RGCs.

      Recommendations For The Authors

      The following significant issues, however, need to be addressed:

      The authors show RGC-like cells that grow axons toward the Pax2+ cells, suggesting that this is a model for RGC axon pathfinding. Is there support from transcriptomic data on the expression of guidance molecules? In addition, the authors need to characterize Pax2+ cells further. Do some give rise to astrocyte-like cells?

      We assessed the expression of known axon guidance genes in CONCEPT organoids. FGF8 and FGF9 trigger axon outgrowth in motor neuron column explants (Shirasaki et al., 2006). In CONCEPT organoids, FGF8 and FGF9 were differentially expressed in assigned optic disc cells; FGFR inhibition drastically decreased the number of RGC soma and directional axon growth (Fig. 9). In addition, SEMA5a and EFNB1 were expressed in both assigned optic disc and stalk cells, EFNB2 was highly expressed in assigned optic disc cells, and NTN1 was mostly expressed in assigned optic cells (supp. Fig. S12). Lines 307-310.

      We compared the transcriptomes between CONCEPT organoids, Gabriel et al.’s brain/optic organoids (Gabriel et al., 2021; PMID: 34407456), and human fetal retinas HGW9 (Lu et al., 2020; PMID: 32386599). Gabriel et al. (2021) report “axon-like” projections in their “optic vesicle-containing brain organoids”. We found that PAX2+ optic disc, PAX2+ optic stalk, FOXG1+ telencephalic, and VSX2+ neuroretinal cell clusters that were found in CONCEPT organoids did not exist in Gabriel et al.’s organoids (supp. Fig. S12), indicating striking differences between Gabriel et al.’s organoids and our CONCEPT telencephalon-eye organoids. Lines 327-345.

      To authenticate PAX2+ cells in CONCEPT organoids, we analyzed a single-cell RNA-seq dataset of human fetal retinas HGW9 and identified a similar PAX2+ cell population, cluster 18 (Fig. 5). Expression signatures of PAX2+ cells between CONCEPT organoids and human fetal retinas HGW9 were similar. Notably, cluster 18 differentially expressed PAX2, COL9A3, CYP1B1, SEMA5A, and FGF9 (Fig. 5B-5F), which were top DEGs of cluster 2 in CONCEPT organoids (Fig. 4F, 4G, 4I, 4K; SEMA5A was shown in supp. Fig. S12A). Overall, 64/113 DEGs of cluster 18 in human fetal retinas HGW9 were also DEGs of cluster 2 in CONCEPT organoids. In both HGW9 and CONCEPT organoids, expression of OLIG2, CD44, and GFAP was undetectable (supp. Fig. S14), indicating that astrocytes had not been generated yet at these stages.

      When cells in cluster 18 and retinal progenitor clusters from the HGW9 dataset were combined with cells in clusters 2, 4, 5, 7 from the CONCEPT dataset for Seurat anchor-based clustering, cells in cluster 18 from HGW9 (H18) were grouped with cluster 2 from CONCEPT organoids (C2, assigned optic disc; N), and these cells expressed both PAX2 and VSX2 (arrowheads in Fig. 5N-5R). A small portion of H18 cells were grouped with cluster 4 from CONCEPT organoids (C4, assigned optic stalk; N), and these cells expressed PAX2 but not VSX2 (arrows in Fig. 5N5R).

      We then compared functional annotations of DEGs (top 200 genes) of cluster 2 in CONCEPT organoids and DEGs (113 genes) of cluster 18 in human fetal retinas HGW9. Top GO terms in GO:MF, GO:CC, and GO:BP are shown (Fig. 6). For DEGs of cluster 2 in CONCEPT organoids, top enriched GO terms in GO:MF, GO:CC, and GO:BP were extracellular matrix structural constituent, collagen-containing extracellular matrix, and system development, respectively. Additional interesting GO:BP terms included axon development, astrocyte development, eye development, response to growth factor, cell adhesion, cell motility, neuron projection development, glial cell differentiation, and signal transduction. For DEGs of cluster 18 in human fetal retinas HGW9, top enriched GO terms in GO:MF, GO:CC, and GO:BP were cell adhesion molecule binding, extracellular space, and developmental process, respectively. Many GO terms were enriched in both samples, further indicating transcriptomic similarities in PAX2+ optic disc cells between CONCEPT organoids and human fetal retinas. Notably, GO terms astrocyte differentiation, neuron projection development, and glial cell differentiation were enriched in the DEGs of assigned optic disc cells for both CONCEPT organoids and human fetal retinas, consistent with expectations.

      Transcriptomic comparisons between CONCEPT organoids and human fetal retinas are found in lines 346-392.

      The Vsx2+Pax2+ population is not typically detected in vivo in the developing mouse eye. The authors claim that they detected them in vivo, but the data supporting this statement are lacking.

      We demonstrate that assigned optic disc cells expressed both VSX2 and PAX2, and this statement is trued for CONCEPT organoids and human fetal retinas HGW9 (Fig. 5N-5R). Please see the underlined sentence in the response to the comment above.

      Do the RGCs express subtype-specific markers? Do they detect markers of other retinal neurons typically born early in development-cones, amacrine cells, horizontal cells? The authors need to compare the transcriptome of different clusters to the published datasets from human and mouse retinae.

      The stage of CONCEPT organoids for scRNA-seq was at an early stage. In this dataset, subtypes of RGCs were undetectable. Isolated RGCs via CNTN2 were at more advanced stages. Distinct expression of POU4F2, ISL1, RBPMS, and SNCG indicate multiple subtypes of RGCs (Fig. 7L-7P).

      We did find other early retinal neurons in the scRNA-seq dataset: photoreceptor cells, amacrine/horizontal cells in CONCEPT organoids (Fig. 4U-4X), and these cells were also in cluster 11 in which RGCs were found.

      We performed transcriptomic comparisons between CONCEPT organoids, brain/optic organoids, and human fetal retinas. We found that PAX2+ optic disc, PAX2+ optic stalk, FOXG1+ telencephalic, and VSX2+ neuroretinal cell clusters that were found in CONCEPT organoids did not exist in Gabriel et al.’s organoids, indicating striking differences between Gabriel et al.’s organoids and our CONCEPT telencephalon-eye organoids (supp. Fig. S13). On the other hand, we found that expression signatures of CONCEPT organoids and human fetal retinas are similar (Figs. 5, 6).

      Transcriptomic comparisons are found in lines 325-392.

      Fig. 3: where are the "lens like" cells located? The structures in panels B and D look very different. Are these lens-cells toward the periphery or scattered throughout?

      Lens cells were dispersed in the zone in which neural retinal cells are located, which is shown in a low-magnification image (Fig. 3K). Panel B and D in Figure 3 were at different stages. At early stages, lens clusters were small (Fig. 3B). At later stages, lens clusters became bigger (Fig. 3D).

      Fig. 3K and L, TEM images: how do the authors know that these are lens cells?

      Western blot of these transparent cell clusters demonstrated that they were lens cells (Fig. 3L).

      Fig. 5: The authors claim that a reduced number of Pax2+ cells is associated with entry of the axons. It is not clear if this is just due to physical barriers or to active axon guidance.

      We believe that Reviewer 2 referred to the gap region of PAX2 expression in Fig. 7A, 7F. RGC axons grew toward and along adjacent PAX2+ VSX2+ cells. Since PAX2+ VSX2+ cells grossly formed a circular shape, RGC axons followed this circular shape. In a gap region of PAX2 expression, RGC axons exited the circle. The association of RGC axon growth with PAX2+ VSX2+ cells was very robust. Besides PAX2+ cell populations, we did not find any other cell populations that directed RGC axon growth.

      Fig. 5K: The authors refer to ALDH1A3 expression in the optic disk, but the presented section does not include the optic disk. In addition, ALDH1A3 is expressed in other regions of the developing retina (Fig. 5K, ref 71).

      We are sorry we did not make it clear. We referred to Li et al.’s (2000) paper (Mech Dev 95, 283-289) for Aldh1a3 expression in the optic stalk. Figure 7K was used to shown Aldh1a3 expression in peripheral retinas on sections.

      Line 263, Reference 68: The authors claim that col13A1 is specific to the human optic disk. However, col13A1 is expressed in many additional eye lineages (PMID: 10865988).

      We are sorry we did not make it clear. We meant that Col13A1 is prominently expressed in the optic disc, which is clearly shown in the referred paper (Figure 3D in the paper PMID: 10865988).

      The authors show that inhibiting FgfR results in fewer RGCs and loss of directed axonal growth. The number of cells is drastically reduced; thus, the relevance of the finding directly to axon guidance is not resolved.

      FGFR inhibitions drastically the number of RGC somas (Fig. 9F-9K). Additionally, remaining RGCs nearly did not grow directional axons (arrowheads in Fig. 9K), and a few remaining axons wandered around (arrow in Fig. 9K), indicating the role of FGF/FGFR signaling in RGC differentiation and directional axon growth.

      Fig. 1H and J: Vsx2 is outside the centroid in panels H and I, but inside the centroid in panels J and K. It is not clear what part of the centroid is shown. This needs to be clarified by adding a scheme.

      We are sorry we did not make it clear. We added separate-channel images showing VSX2 and PAX6 expression (supp. Figs. S1, S2) and a new diagram (left panel in Fig. 1B). Overall, FOXG1, VSX2, and PAX6 expression at days 15-17 formed three concentric zones spanning from the center to the periphery. At days 22-26, VSX2 expression expanded peripherally, largely overlapping PAX6 expression (supp. Figs. S1, S2).

      Pax6 should be in all cells, also on day 17. Show the separate channels, including DAPI.

      We added separate-channel images (supp. Figs. S1, S2). In cysts, PAX6 was expressed in all cells. After cysts attached to the culture surface and grew as colonies, distinct levels of PAX6 expression emerged in concentric zones. At days 17 and 26, PAX6 expression at the central zone (which cells expressed FOXG1) became lower, which is obvious in separate-channel images (supp. Figs. S1, S2). Consistently, PAX6 expression was low in FOXG1+ telencephalic cells in the scRNA-seq (Fig. 4C, 4D).

      Lines 27-30: this is a long and complex sentence which needs to be clarified.

      We broke it into a few sentences to make it clearer.

      Line 43: fix "Retina" to "Retinal"

      We fixed it.

      Lines 376-377: repeated "mechanisms of".

      We fixed it.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful for the comments from the reviewers, which helped us to strengthen our analyses and communicate more effectively the details of our findings and their significance. To address their criticisms, we have performed new analyses and revised the text and figures. We believe the manuscript was significantly improved. We provide the line number of important parts of the text that were changed, here in this letter. Below, we address the specific comments from the reviewers in detail.

      Reviewer #1 (Public Review):

      Gehr and colleagues used an elegant method, using neuropixels probes, to study retinal input integration by mouse superior collicular cells in vivo. Compared to a previous report of the same group, they opto-tagged inhibitory neurons and defined the differential integration onto each group. Through these experiments, the author concluded that overall, there is no clear difference between the retina connectivity to excitatory and inhibitory superior colliculus neurons. The exception to that rule is that excitatory neurons might be driven slightly stronger than inhibitory ones. Technically, this work is performed at a high level, and the plots are beautifully conceived, but I have doubts if the interpretation given by the authors is solid. I will elaborate below.

      Some thoughts about the interpretation of the results.

      My main concern is the "survivor bias" of this work, which can lead to skewed conclusions. From the data set acquired, 305 connections were measured, 1/3 inhibitory and 2/3 excitatory. These connections arise from 83 RGC onto 124 RGC (I'm interpreting the axis of Fig.2 C). Here it is worth mentioning that different RGC types have different axonal diameters (Perge et al., 2009). Here the diameter is also related to the way cells relay information (max frequencies, for example). It is possible that thicker axons are easier to measure, given the larger potential changes would likely occur, and thus, selectively being picked up by the neuropixels probe. If this is the case, we would have a clear case of "survival bias", which should be tested and discussed. One way to determine if the response properties of axonal termini are from an unbiased sample is to make a rough functional characterization as generally performed (see Baden et al. 2006). This is fundamental since all other conclusions are based on unbiased sampling.

      First of all, we want to thank the reviewer for the detailed and constructive comments based on which we refined the analysis and updated the figures. We hope that our changes adequately address the concerns of reviewer #1.

      We would like to clarify that Fig. 2C represents an example from a single experiment. In total, we recorded 326 RGCs and 680 SC neurons in total, with 161 individual RGCs making connections onto 183 individual SC neurons. Moreover, we thank the reviewer for bringing up that important point about the potential “survivor bias”. To address this concern, we would like to provide some clarifications (see below). In addition, we now added the point that different RGCs can have different axonal diameter as requested by the reviewers (line 605).

      It is important to note that our approach does not capture the total pool of retinal inputs. Moreover, we did not want to convey the impression that our approach equally captures all retinal inputs to a given SC neuron, as this is not the case. Likewise, it is important to note that our current method does not allow for the measurement of axonal diameters. To obtain an estimate of axonal thickness, complementary techniques such as imaging/staining or electron microscopy would be needed. Our study aimed at characterizing connected RGC-SC pairs and how excitatory and inhibitory neurons in the SC integrate retinal inputs, providing valuable insight on their wiring principles.

      We greatly appreciate the reviewer for highlighting this limitation and we now address these points in the discussion of the revised version of our manuscript (line 603).

      Regarding the suggested “rough functional characterization” of the RGCs. We have thought about this analysis and unfortunately, we did not present the necessary stimuli, e.g. chirp, in all experiments to be able to perform this analysis. Moreover, the dataset represented in this work contains only 326 RGC neurons, with 161 identified RGCs making connections to SC neurons. Thus, it is unlikely that our dataset uniformly covers all ~30 RGC types in the mouse. However, given that our dataset is the first measurement of RGC inputs to SC INs and SC EXNs in vivo, we believe it provides a first step and a foundation for future studies focusing on specific RGC types to refine our understanding of the RGC-SC circuitry. We discuss this point now in the revised manuscript (line 586).

      One aspect that is not clear to me is to measure of connectivity strength in Figure 2. Here it seems that connectivity strength is directly correlated with the baseline firing rate of the SC neuron (see example plots). If this is a general case, the synaptic strength can be assumed but would only differ in strength due to the excitability of the postsynaptic cell. This should be tested by plotting the correlation coefficient analysis against the baseline firing rate.

      We appreciate the reviewer for bringing up this important point. From the analysis perspective, we would like to clarify that the efficacy measure is independent of the baseline firing rate. It quantifies the probability of adding spikes on top of the baseline rate by subtracting the baseline firing rate before measuring the area of the peak (Usrey et al., 1999).

      Furthermore, we acknowledge the reviewer’s interesting and valuable observation about the relationship of the firing rate and the excitability of the SC neuron in the example plots. To test whether the efficacy is directly related to the mean firing rate, we conducted additional analyses to show the efficacy measure as a function of the mean firing rate (Author response image 1 and Figure 2G). To that end, we utilized two different measures of firing rate: the mean firing rate during spontaneous activity (gray screen) over a duration of 10 sec (across 30 trials), which was interleaved with the natural movie presentations, and the overall firing rate throughout the entire recording session. Our findings indeed reveal a positive correlation, as predicted by the reviewer (Author response image 1, gray screen: EXC r = 0.22721; p < 0.00081; INH: r = 0.34677, p= 0.00076; entire recording: EXC r = 0.42685; p < 0.0005; INH: r = 0.43543, p = 0.00002).

      Author response image 1.

      Efficacy measure of connected RGC-SC pairs as a function of the mean firing rate during different stimulus conditions: during spontaneous activity (gray screen, left) and throughout the entire recording session (right).

      However, it is important to note that although we observe a correlation on the population level, the relationship between postsynaptic firing and efficacy is diverse. We identify pairs with strong connections despite the firing rate of the postsynaptic SC cell being low. Likewise, we also find pairs with weak connections despite the firing rate of the SC neuron being high (Author response image 2). These observations suggest that factors beyond the postsynaptic firing contribute to the efficacy of the connection. This is exemplified by the fact that SC neurons can receive both strong and weak connections from their convergent presynaptic RGC pool.

      Author response image 2.

      RGC-SC connectivity. Cross-correlograms showing 4 connected RGC-SC pairs (top) with two RGCs connecting onto the same SC neuron. Raster plots of SC neuron spiking activity in response to firing of the presynaptically connected RGC. The same SC neuron can receive both strong and weak RGC inputs.

      In summary, we thank the reviewer for bringing up this important question, and we believe that our additional analyses shed light on the relationship between firing rate and efficacy. This result is very interesting, and we include these findings in the updated Figure 2 in the revised manuscript (panel 2G) in exchange with the panel of the peak latency. Moreover, we also address this point now in the results and discussion section of the revised manuscript (line 280 and line 525).

      My third concern is the assessment of functional similarity in Fig. 3. It is not clear to me why the similarity value was taken by the arithmetic mean. For example, even if the responses are identical for one connected pair that exclusively responds either to the ON or OFF sparse noise, the maximal value can only be 0.67. Perhaps I misunderstood something.

      We thank the reviewer for raising this point about the clarification regarding the calculation of the similarity index. We apologize for any confusion caused by our description on the similarity index calculation. To clarify, the similarity index was calculated specifically between the responses of the RGC and the responses of the postsynaptic SC neuron, rather than between the neurons and the visual stimulus. As a result, the similarity index reflects the degree of similarity in the responses of the connected pairs. Therefore, if the responses of the RGC and the connected postsynaptic neuron are identical, regardless of whether they respond exclusively to ON, only to OFF, or a mixture of ON-OFF, the similarity index will be one. We have updated the relevant part in the methods section to make this point clearer to the reader (line 917).

      Secondly, correlations in natural(istic) movies can differ dramatically depending on the frame rate that the movie was acquired and the way it is displayed to the animal. What looks natural to us will elicit several artifacts at a retinal level, e.g., due to big jumps between frames (no direction-selective response) or overall little modulation (large spatial correlations). I would rather opt for uniform stimuli, as suggested previously. Of course, these are also approximations but can be easily reproduced by different labs and are not subjected to the intricacies of the detailed naturalistic stimulus used.

      We agree with the reviewer that spatiotemporal correlations of naturalistic stimuli are complex. To address this point, we added two stimuli with little spatiotemporal correlations to the similarity analysis. The first stimulus we added is a phase scrambled version of the natural movie (PSM, also taken from Froudarakis et al. (2014)). The second is a binary white noise checkerboard stimulus. These stimuli were presented randomly interleaved with the natural movie, for 30 trials each. The similarity index analysis revealed that even with uniform stimuli included, the average similarity index is correlated to the efficacy. We show this data now in Figure 3.

      Fourth. It is important to control the proportion of inhibitory cells activated optogenetically across the recording probe. Currently, it is not possible to assess if there are false negatives. One way of controlling for this would be to show that the number of inhibitory interneurons doesn't vary across the probe.

      We thank the reviewer for highlighting this important aspect of the experiment and analysis. We are aware of this point and therefore took extra care to minimize the biases that could be introduced by our recording and stimulation method. Our approach to include recorded excitatory and inhibitory neurons was conservative, briefly:

      1. We included only excitatory and inhibitory neurons that were within the SC, defined by visually driven activity and continuous retinotopy (see method).

      2. We further restricted the included neurons to neurons that were located within the boundaries of the LED evoked responses, i.e. the recording channels with optogenetic evoked MUA responses within the SC (Figure 1 – figure supplement 1).

      3. Both excitatory and inhibitory SC neurons were selected in this way.

      These inclusion criteria were specifically designed to avoid sampling excitatory neurons from regions on the Neuropixels probe that lacked optogenetically evoked responses and thus to minimize the number of falsely labeled excitatory neurons.

      To illustrate these inclusion criteria and the resulting spatial distribution of the selected excitatory and inhibitory SC neurons along the 384 channels of the Neuropixels probe, we now added a supplementary figure (Figure 1 – figure supplement 1). This figure shows the multi- unit activity in response to optogenetic stimulation and the distribution of inhibitory and excitatory single units within the range of channels that are activated via LED stimulation for 3/11 selected experiments. This highlights that we employed stringent criteria for determining the boundaries and selecting which neurons to include in our study. The distribution of excitatory and inhibitory SC neurons is not significantly different for 9/11 experiments (Wilcoxon rank-sum test, p values = 0.307, 0.0115, 0.755, 0.834, 5.0110-6, 0.79, 0.80, 0.26, 0.33, 0.08, 0.13). Moreover, in the two significantly different experiments only 2 RGC-SC EXC pairs were located in the region without identified SC INs, and thus will not affect the results. We now address this point in the methods section (line 859).

      Fifth. In Fig. 4, the ISI had a minimal bound of 5 ms. Why? This would cap the firing rate at 200Hz, but we know that RGC in explants can fire at higher frequencies for evoked responses. I would set a lower bound since it should come naturally from the after-depolarization block.

      The chosen 5 ms minimal bound was in the range used in previous literature, e.g. 4-30 ms in Usrey et al. 1998 (Usrey et al., 1998). To address the question of the reviewer, we re-analyzed the data with a lower bound of 2 ms (2 – 30 ms) to include RGCs that fire at higher frequencies than 200Hz. However, we did not observe a clear difference between the 2-30 and 5-30 ms groups for inhibitory connections (SC IN: p = 0.604). Only the excitatory connections show a statistically significant difference (p = 0.011), however, the effect size is small (Cohen’s d = EXC = 0.063, INH = 0.030). Nonetheless we updated a panel in figure 4 to represent the 2-30 ms group (Figure 4F).

      Another aspect that remains unclear is to what extent the paired-spike ratio depends on the baseline firing rate. This would change the interpretation from the particular synaptic connection to the intrinsic properties of the cell and is plausible since the bassline firing rate varies tremendously.

      To address how the paired-spike ratio depends on the baseline firing rate we plotted the change of PSR depending on ISI as suggested by the reviewer.

      One related analysis would be to plot the change of PSR depending on the ISI. It would be intuitive to make a scatter plot for all paired spikes of all recorded neurons (separated into inhibitory and excitatory) of ISI vs. PSR.

      We appreciate the valuable suggestion from the reviewer. We have now separated the ISIs into distinct groups spanning 5 ms intervals represented in Author response image 3, right. These intervals range from 5-10 ms up to 25-30 ms. Notably, we observe a difference between the excitatory and inhibitory populations. The excitatory population exhibits a monotonic decrease in mean PSR across the intervals, while the inhibitory population shows a peak around 10/15 ms.

      Author response image 3.

      Change of mean paired-spike ratio (PSR) depending on ISI. Left) Comparison of PSR between two groups of different ISIs. The 2-30 ms group ensures to include high-firing RGCs (excitatory pairs 2-30 vs 5-30 ms p = 0.011; inhibitory pairs 2-30 vs 5-30 ms p = 0.604, Wilcoxon signed-rank). Right) PSR for groups of different ISI intervals. Mean PSR ± SEM for excitatory groups: 2.0±0.09, 1.75±0.09, 1.51±0.05, 1.31±0.05, 1.2±0.05; inhibitory groups: 1.35±0.06, 1.51±0.09, 1.5±0.1,1.22±0.06, 1.21±0.07. p E vs I (within group): 1.5510-5, 9.55±10-2, 4.21±10-1, 3.74±10-1, 6.22 ±10-1, Wilcoxon rank-sum test.

      Panel 4E is confusing to me. Here what is plotted is efficacy 1st against PSR (which is efficacy 2nd/efficacy 1st). Given that you have a linear relation between efficacy 1st and efficacy 2nd (panel 4C), you are essentially re-plotting the same information, which should necessarily have a hyperbolic relationship: [ f(x) = y/x ]. Thus, fitting this with a linear function makes no sense and it has to be decaying if efficacy 2nd > efficacy1st as shown in 4C.

      We thank the reviewer for raising this question which helped us to improve the representation and disruption of the results shown in figure 4. Panel 4E is intended to investigate whether there is a correlation between the efficacy strength (eff 1st) and the amount of facilitation (PSR). From panel 4C it is already evident that the data points for high efficacies lie closer to the unity line, as compared to the data points for low efficacies. This suggests that the PSR is stronger for connections with smaller efficacies 1st. To quantify this relationship, we have plotted the efficacy 1st vs the PSR in panel 4E, which thus adds new information to the figure. Importantly, this panel is shown in log-log scales, and therefore the decaying relationship is not evident. If we had shown the data on linear-linear scale, the decaying function would have been evident (Author response image 4). And indeed, as the reviewer pointed out, we cannot fit a hyperbolic relationship with a linear function. This is exactly the reason why we show the data in log-log scale and also estimate the Pearson correlation also from the logs of the efficacies and PSRs.

      In Author response image 4 we show the relationship plotted on linear scale using an approach to fit the hyperbolic relationship employing a hyperbolic cosecant function 𝑎/𝑠𝑖𝑛ℎ(𝑏 ∗ 𝑥) + 𝑐.

      Author response image 4.

      Relationship between efficacy to 1st RGC and PSR visualized on linear scale using a hyperbolic fitting approach 𝑎/𝑠𝑖𝑛ℎ(𝑏 ∗ 𝑥) + 𝑐.

      Finally, in Figure 5, the perspective is inverted, and the spike correlations are seen from the perspective of SC neurons. Here it would also be good to plot the cumulative histograms and not look at the averages.

      We added the cumulative histogram in Figure 5 (panel B), in addition to represent the raw data points and the mean.

      Regarding the similarity index and use of natural stats, please see my previous comments. Also, would it be possible to plot the contribution v/s the firing rate with the baseline firing rate with no stimulation or full-field stimulation? This is important since naturalistic movies have too many correlations and dependencies that make this plot difficult to interpret.

      We now show the contribution vs firing rates for different stimulus conditions in a new figure supplement (Figure 5- figure supplement 1). We added the correlations to the different stimuli for baseline firing rate with no stimulation (gray background), full-field stimulation (checkerboard) and phase scrambled natural movie.

      Overall, the paper only speaks from excitatory and inhibitory differences in the introduction and results. However, it is known that there are three clear morphologically distinct classes of excitatory neurons (wide-field, narrow-field, and stellate). This topic is touched in the discussion but not directly in the context of these results. Smaller cells might likely be driven much stronger. Wide-field cells would likely not be driven by one RGC input only and will probably integrate from many more cells than 6.

      We thank the reviewer for this comment. We agree with the reviewer that addressing how the different excitatory and inhibitory cell-types integrate RGC input is important to understand the visual processing mechanisms in the SC. The presented study aimed at comparing the excitatory and inhibitory population in general using the VGAT-ChR2 mouse line. Understanding how specific genetically defined cell-types integrate RGC inputs is clearly very interesting and should be done. Unfortunately, the mouse lines that would allow targeting genetically identified inhibitory cell-types are still limited and therefore we can only use functional measurements to assess different types of neurons in the SC. We now address this point about distinct SC cell-types in the discussion (line 643).

      One possible functional measurement is the size of the receptive field, which, to some degree, could be used as a proxy for different morphologies, i.e. small receptive fields could hint towards compact morphology while large receptive fields could indicate a wider morphology. It is known for example that narrow-field and stellate cells have small RF sizes, while wide-field cells have large RFs. We studied the relationship between the RF size and spike waveform duration but did not find a significant correlation (Figure R6). Moreover, the spike waveform duration, as discussed in the manuscript, is not a valid criterion to separate EXNs and INs in the SC, as it is common practice in the cortex. We now also looked into whether the connectivity strength is related to the RF size. Interestingly, while in the current dataset we do not find a significant correlation between the efficacy and the receptive field size for both EXN and IN (Author response image 5, left), we do find a significant negative correlation between contribution and receptive field size for the excitatory neurons (Author response image5, right). This result indicates that SC excitatory neurons with small receptive fields are more strongly coupled to the RGC input as compared to neurons with larger receptive fields.

      Author response image 5.

      Relationship between RF size and connectivity measures (efficacy and contribution) for RGC-SC EXN and RGC-SC IN pairs (two-sided Wilcoxon rank-sum test).

      Reviewer #2 (Public Review):

      This study follows up on a previous study by the group (Sibille et al Nature Communications 2022) in which high density Neuropixel probes were inserted tangentially through the superficial layers of the superior colliculus (SC) to record the activity of retinocollicular axons and postsynaptic collicular neurons in anesthetized mice. By correlating spike patterns, connected pairs could be identified which allowed the authors to demonstrate that functionally similar retinal axon-SC neuron pairs were strongly connected.

      In the current study, the authors use similar techniques in vGAT-ChR2 mice and add a fiber optic to identify light-activated GABAergic and non-light-activated nonGABAergic neurons. Using their previously verified techniques to identify connected pairs, within regions of optogenetic activation they identified 214 connected pairs of retinal axons and nonGABAergic neurons and 91 pairs of connected retinal axons and GABAergic neurons. The main conclusion is that retinal activity contributed more to the activity of postsynaptic nonGABAergic SC neurons than to the activity of postsynaptic GABAergic SC neurons.

      The study is very well done. The figures are well laid out and clearly establish the conclusions. My main comments are related to the comparison to other circuits and further questions that might be addressed in the SC.

      It is stated several times that the superior colliculus and the visual cortex are the two major brain areas for visual processing and these areas are compared throughout the manuscript. However, since both the dorsal lateral geniculate nucleus (dLGN) and SC include similar synaptic motifs, including triadic arrangements of retinal boutons with GABAergic and nonGABAergic neurons, it might be more relevant to compare and contrast retinal convergence and other features in these structures.

      Thank you for pointing out that crucial point. Indeed, the comparison to the thalamus is a valid argument, as both the SC and LGN are primary targets of RGC axon terminals. During the preparation of the manuscript, we extensively discussed whether to compare our new SC dataset with existing literature on the LGN or the primary visual cortex (V1) is the more appropriate. Ultimately, we decided on using the visual cortex as the main comparison because of the following reasons:

      1. The SC is widely recognized as an evolutionary conserved circuit for visual computation and visually guided behaviors, while the dLGN is generally regarded as a relay station for RGC information to the visual cortex (Steriade, McCormick, 1997). Thus, we believe it is more relevant to compare the evolutionary older visual circuit (SC) to the evolutionary newer visual circuit (visual cortex).

      2. In the mouse, the dLGN contains only a limited number of inhibitory interneurons and represent only approximately 6% of the total dLGN neuronal population (Butler, 2008; Evangelio et al., 2018). It has been suggested that the rodent somatosensory thalamus even lacks interneurons (Arcelli et al., 1997). Consequently, directly comparing inhibitory interneurons in the SC to those in the dLGN would pose challenges.

      3. Along the same line, the density and also the diversity of inhibitory neurons in the SC is high and likely more comparable to the density and diversity of inhibitory neurons in the visual cortex, than to the dLGN circuit. In the dLGN, TC projection neurons far outnumber inhibitory neurons (Arcelli et al., 1997; Evangelio et al., 2018) and the dLGN is inhabited by just 1-2 classes of GABAergic retinorecipient interneurons (Arcelli et al., 1997; Jaubert-Miazza et al., 2005; Krahe et al., 2011; Ling et al., 2012). Classification approaches (e.g. 3D reconstruction) so far have not revealed any subclasses except for distinctions in intrinsic membrane properties (Leist et al., 2016), suggesting low interneuron diversity in the dLGN. This is in contrast to the vLGN, where a recent study found a diversity of GABAergic neurons (Sabbagh et al., 2021).

      4. In the thalamo-cortical circuit, there exists a notable difference in how cortical excitatory and cortical inhibitory neurons are driven by their thalamic input (Alonso and Swadlow, 2005; Cruikshank et al., 2007). This discrepancy forms the basis for several models of visual processing in the visual cortex (Kremkow et al., 2016; Taylor et al., 2021). Which is why we wanted to assess whether the SC follows similar or different rules.

      That said, the reviewer is correct that the dLGN and the SC share certain wiring motifs, such as the triadic arrangements of retinal boutons. Unfortunately, the VGAT-ChR2 mouse line used in our study does not specifically label SC inhibitory neurons that are involved in the formation of triadic arrangements. Therefore, we are unable to draw specific conclusion regarding this point. To further investigate this aspect, the usage of GAD67 mice, which have been shown to selectively label intrinsic interneurons which receive RGC input and contact non-GABAergic dendrites (Whyland et al., 2020), would be necessary. Nonetheless, we acknowledge the question raised by the reviewer and in response, we have now provided a more in-depth comparison to the dLGN in the discussion section of the revised manuscript (line 565).

      The GABAergic and nonGABAergic neurons showed a wide range of firing rates. It might be interesting to sort the cells by firing rates to see if they exhibit different properties. For example, since the SC contains both GABAergic interneurons and projection neurons it would be interesting to examine whether GABAergic neurons with higher firing rates exhibit narrower spikes, similar to cortical fast spiking interneurons. Similarly, it might be of interest to sort the neurons by their receptive field sizes since this is associated with different SC neuron types.

      We thank the reviewer for the interesting suggestions of SC neurons classification into different categories. The relationship between connectivity measures and RF size has been addressed in Author response image 5. We have now studied the relationship of spike waveforms and several measures such as firing rate and RF size in more detail (Author response image 6).

      As the baseline firing is generally low in SC and our experiments are performed under anesthetized conditions, we used the evoked firing rates to sort the cells by firing rates or RF sizes. We have added an analysis showing the mean firing rate (calculated over the full recording duration) as a function of the spike width (peak-to-trough duration). We observe no significant relationship between the different groups of cell types. The same accounts if we sort the SC neurons by their RF size. RF sizes were calculated from PSTHs and summed RF for SL and SD. We do not see a relationship between neuron type and firing or RF size.

      Author response image 6.

      Mean firing rate (left) and RF size (right) as a function of peak-to-trough (PT) duration for excitatory and inhibitory SC neurons. Both measures are not correlated to the PT duration (Pearson correlation coefficient, two-sided Wilcoxon rank-sum test).

      The recording techniques allowed for the identification of the distance between connected retinocollicular fibers and postsynaptic neurons. It might also be interesting to compare the properties of connected pairs recorded at dorsal versus ventral locations since neurons with different genetic identities and response properties are located in different dorsal/ventral locations (e.g. Liu et al. Neuron 2023). Also, regarding the strength of connections, previous electron microscopy studies have shown that the retinocollicular terminals differ in density and size in the dorsal/ventral dimension (e.g Carter et al JCN 1991).

      We thank the reviewer for raising this interesting and relevant point to compare the properties of the connected pairs across the dorsal and ventral location. Unfortunately, our tangential recording approach is not ideally suited for comparing the properties of neurons across the different SC depths. For comparing dorsal versus ventral located neurons in the SC, as done in Liu et al., Neuron 2023, vertical recordings would be more appropriate. We now provide a discussion on this aspect (line 589).

      Was optogenetic activation of GABAergic neurons ever paired with visual activation? It would be interesting to examine the receptive fields of the nonGABAergic neurons before and after activation of the GABAergic neurons (as in Gale and Murphy J Neurosci 2016).

      This is an important point and indeed we have paired activation of GABAergic neurons with visual stimulation (checkerboard stimulus) to assess the impact of the GABAergic neurons on the firing of the excitatory neurons. We observed a diversity of effects, with some EXNs being strongly suppressed and others being only weakly suppressed. Thus, we predict that the receptive field of those EXN that are suppressed by optogenetically evoked IN firing, should be affected in some way. However, the checkerboard stimulus was only presented for a short duration (1 s) and for only a few trials (n = 30). Therefore, estimating the receptive fields of EXN before and after optogenetic activation of GABAergic neurons is unfortunately not possible with the existing dataset. We now mention this point in the discussion (line 668).

      Reviewer #3 (Public Review):

      This study performs in vivo recordings of neurons in the mouse superior colliculus and their afferents from the retina, retinal ganglion cells (RGCs). Building on a preparation they previously published, this study adds the use of optogenetic identification of inhibitory neurons (aka optotagging) to compare RGC connectivity to excitatory and inhibitory neurons in SC. Using this approach, the authors characterize connection probability, strength, and response correlation between RGCs and their target neurons in SC, finding several differences from what is observed in the retina-thalamus-visual cortex pathway. As such, this may be a useful dataset for efforts to understand retinocollicular connectivity and computations.

      Recommendations:

      Reviewer #1 (Recommendations For The Authors):

      Some minor points.

      Fig.1G shows a difference in mean firing rates between inhibitory and excitatory cells. Please plot the cumulative distribution of firing rates to be able to scrutinize the data better.

      We have addressed this issue and updated panel G in Figure 1.

      Fig. 2C. The black background color of this plot is black; it is not possible to decipher much, please change it to white

      We have now changed panel C in Figure 2 to a white background.

      Fig. 4D would be better represented as a histogram since most points overlap.

      We now represent panel D in Figure 4 as a histogram.

      Citations. I would cite some of the foundational work, in some instances, e.g., in the first sentence (SC receives input from the retina)

      We have now addressed this issue and cited more foundational studies (e.g. line 68)

      The discussion is a bit long; the last paragraph can be removed, mainly because the previous section conflates superficial SC with the entire SC, which is confusing (e.g., Ayupe et al.). In this way, there is more space to discuss the direct implication of the study within the context of known cell types.

      We now shortened the discussion and provide more background about different SC cell types in the discussion (line 643).

      Reviewer #2 (Recommendations For The Authors):

      Minor correction: Whyland et al 2020 did not identify V1 input to horizontal cells. A more appropriate reference is Zingg et al Neuron 2017.

      We thank the reviewer for this important point and have now corrected the citation in line 613 in the discussion to Zingg et al 2017.

      Reviewer #3 (Recommendations For The Authors):

      Regarding the degree of convergence from RGC to SC, the Crair lab (Furman 2013) performed a quantal analysis in slice that is worth citing.

      We included this citation in the revised version of the manuscript (line 501).

      I have lost track at this point, but many labs (Heimel, Meister, Farrow, Cang, Isa, maybe others?) have observed that neighboring SC neurons have similar tuning for direction/orientation, but the circuit mechanisms are not well understood. Given the relatively weak correlation between response tuning of RGC axons and their SC target neurons, a useful comparison might be that of SC neurons and their neighbors, and whether SC neurons that show weaker correlation to their RGC axons show stronger correlations with their SC neighbors, which could implicate local connectivity within SC.

      We thank the reviewer for providing this interesting comment. With our recording approach we could study locally connected SC neurons. However, the focus of our study was to first characterize the retinocolliculuar connectivity and therefore investigating the intracollicular connectivity is beyond the scope of the current study. We thank the reviewer for the valuable suggestion and will consider to tackle this aspect in a separate study in the future.

      Is it possible any of these measurements are biased by laminar targeting of their probe within superficial SC? Their schematic seems to suggest they targeted the deeper part of superficial SC. Do they know whether they recorded throughout superficial SC or targeted the deeper layers closer to stratum opticum?

      Our recordings are in between the deeper and upper visual SC layer depending on the recording site on the Neuropixels probe as we use an angled insertion approach. Besides DiI staining (Author response image 7), we can estimate the location of the probe using functional measurements, i.e. visually driven channels and retinotopic locations of the recording sites. If the Neuropixels probe is inserted too superficial, the number of recording site with visually driven activity is low. If the Neuropixels probe is inserted too deep in the visual layers we see two separated regions on the probe with visually driven activity in which the retinotopy is non-continues (please refer to Figure 2 in (Sibille et al., 2022)). In the recordings included in this study, the number of visually driven channels was generally high and the retinotopy continues, suggesting that we covered a region within the deeper and upper visual layers.

      Author response image 7.

      Functional estimation of probe location. DiI staining of Neuropixels probe (middle) and multi-unit activity across channels in response to visual stimulation (bottom). The white dashed lines in the middle and bottom panels mark the rough boundaries of the visual SC layers.

      In Fig. 4, the authors argue that firing in inhibitory neurons is less correlated with RGC input. Does their metric for contribution of retinal input control for the fact that inhibitory neurons have higher firing rates overall and, e.g., may be more depolarized at rest and likelier to fire spontaneous spikes but no less likely to be driven by retina? Or is the argument that their visual responses are more likely to be driven by V1 or local connections?

      We thank the reviewer for bringing up that point. The contribution measure estimates the fraction of SC spikes that were preceded by an RGC spike and it is thus, in theory, independent of the firing rate of the SC neuron. In practice, however, we agree that high firing SC neurons may be more likely to have a lower contribution value simply because a larger fraction of their spikes is not preceded by the activity of the presynaptic RGC. But this is exactly what we aimed at characterizing with this analysis. Where these non-RGC driven SC spikes originate from, whether from a more depolarized state of the neuron or by other sources such as V1 or local connections, we can only speculate about. That said, please note that despite SC INs having higher firing rates, not all of them show low contribution. Likewise, we also see SC neurons with low firing rates and low contribution values (new Supp Fig. 3).

      Minor point: The optotagging in the example cell doesn't cause the cell to fire for ~50 ms? That is odd. Typically, cells classified as optotagged fire within 5-10 ms of light onset. Is that a strange example cell or is there something different about the optotagging approach?

      Unfortunately, transient LED light onsets and offsets can induce light artifacts on Neuropixels probes (Jun et al., 2017; Steinmetz et al., 2021) and therefore it is challenging to use brief LED pulses for optotagging with Neuropixels probes. To avoid this overlap of artefacts and LED evoked spikes, we opted for a longer stimulus duration of 100 ms to activate VGAT neurons (Bennett et al., 2019; Siegle et al., 2019). Moreover, instead of a square pulse, we used a slow ramping for light onsets and offsets to minimize the magnitude of induced artifacts. In Author response image 8 we present examples of individual activated VGAT neurons responding to a 100 ms blue light pulse.

      Author response image 8.

      Optotagging approach. Example traces of a single stimulation pulse and protocol used for optogenetic stimulation. Evoked activity in response to LED stimulation (100ms, 100 trials) for six example SC IN neurons.

      References

      Alonso J-M, Swadlow HA. 2005. Thalamocortical specificity and the synthesis of sensory cortical receptive fields. J Neurophysiol 94:26–32. doi:10.1152/jn.01281.2004

      Arcelli P, Frassoni C, Regondi MC, De Biasi S, Spreafico R. 1997. GABAergic neurons in mammalian thalamus: a marker of thalamic complexity? Brain Res Bull 42:27–37. doi:10.1016/s0361- 9230(96)00107-4

      Bennett C, Gale SD, Garrett ME, Newton ML, Callaway EM, Murphy GJ, Olsen SR. 2019. Higher-Order Thalamic Circuits Channel Parallel Streams of Visual Information in Mice. Neuron 102:477- 492.e5. doi:10.1016/j.neuron.2019.02.010

      Butler AB. 2008. Evolution of the thalamus: a morphological and functional review. Thalamus & Related Systems 4:35–58. doi:10.1017/S1472928808000356

      Cruikshank SJ, Lewis TJ, Connors BW. 2007. Synaptic basis for intense thalamocortical activation of feedforward inhibitory cells in neocortex. Nat Neurosci 10:462–468. doi:10.1038/nn1861

      Evangelio M, García-Amado M, Clascá F. 2018. Thalamocortical Projection Neuron and Interneuron Numbers in the Visual Thalamic Nuclei of the Adult C57BL/6 Mouse. Frontiers in Neuroanatomy 12.

      Froudarakis E, Berens P, Ecker AS, Cotton RJ, Sinz FH, Yatsenko D, Saggau P, Bethge M, Tolias AS. 2014. Population code in mouse V1 facilitates readout of natural scenes through increased sparseness. Nat Neurosci 17:851–857. doi:10.1038/nn.3707

      Jaubert-Miazza L, Green E, Lo F-S, Bui K, Mills J, Guido W. 2005. Structural and functional composition of the developing retinogeniculate pathway in the mouse. Vis Neurosci 22:661–676. doi:10.1017/S0952523805225154

      Jun JJ, Steinmetz NA, Siegle JH, Denman DJ, Bauza M, Barbarits B, Lee AK, Anastassiou CA, Andrei A, Aydın Ç, Barbic M, Blanche TJ, Bonin V, Couto J, Dutta B, Gratiy SL, Gutnisky DA, Häusser M, Karsh B, Ledochowitsch P, Lopez CM, Mitelut C, Musa S, Okun M, Pachitariu M, Putzeys J, Rich PD, Rossant C, Sun W, Svoboda K, Carandini M, Harris KD, Koch C, O’Keefe J, Harris TD. 2017. Fully integrated silicon probes for high-density recording of neural activity. Nature 551:232–236. doi:10.1038/nature24636

      Krahe TE, El-Danaf RN, Dilger EK, Henderson SC, Guido W. 2011. Morphologically Distinct Classes of Relay Cells Exhibit Regional Preferences in the Dorsal Lateral Geniculate Nucleus of the Mouse. J Neurosci 31:17437–17448. doi:10.1523/JNEUROSCI.4370-11.2011

      Kremkow J, Perrinet LU, Monier C, Alonso J-M, Aertsen A, Frégnac Y, Masson GS. 2016. Push-Pull Receptive Field Organization and Synaptic Depression: Mechanisms for Reliably Encoding Naturalistic Stimuli in V1. Frontiers in Neural Circuits 10.

      Leist M, Datunashvilli M, Kanyshkova T, Zobeiri M, Aissaoui A, Cerina M, Romanelli MN, Pape H-C, Budde T. 2016. Two types of interneurons in the mouse lateral geniculate nucleus are characterized by different h-current density. Sci Rep 6:24904. doi:10.1038/srep24904

      Ling C, Hendrickson ML, Kalil RE. 2012. Morphology, Classification, and Distribution of the Projection Neurons in the Dorsal Lateral Geniculate Nucleus of the Rat. PLOS ONE 7:e49161. doi:10.1371/journal.pone.0049161

      Sabbagh U, Govindaiah G, Somaiya RD, Ha RV, Wei JC, Guido W, Fox MA. 2021. Diverse GABAergic neurons organize into subtype-specific sublaminae in the ventral lateral geniculate nucleus. J Neurochem 159:479–497. doi:10.1111/jnc.15101

      Sibille J, Gehr C, Teh KL, Kremkow J. 2022. Tangential high-density electrode insertions allow to simultaneously measure neuronal activity across an extended region of the visual field in mouse superior colliculus. J Neurosci Methods 376:109622. doi:10.1016/j.jneumeth.2022.109622

      Siegle JH, Jia X, Durand S, Gale S, Bennett C, Graddis N, Heller G, Ramirez TK, Choi H, Luviano JA, Groblewski PA, Ahmed R, Arkhipov A, Bernard A, Billeh YN, Brown D, Buice MA, Cain N, Caldejon S, Casal L, Cho A, Chvilicek M, Cox TC, Dai K, Denman DJ, de Vries SEJ, Dietzman R, Esposito L, Farrell C, Feng D, Galbraith J, Garrett M, Gelfand EC, Hancock N, Harris JA, Howard R, Hu B, Hytnen R, Iyer R, Jessett E, Johnson K, Kato I, Kiggins J, Lambert S, Lecoq J, Ledochowitsch P, Lee JH, Leon A, Li Y, Liang E, Long F, Mace K, Melchior J, Millman D, Mollenkopf T, Nayan C, Ng L, Ngo K, Nguyen T, Nicovich PR, North K, Ocker GK, Ollerenshaw D, Oliver M, Pachitariu M, Perkins J, Reding M, Reid D, Robertson M, Ronellenfitch K, Seid S, Slaughterbeck C, Stoecklin M, Sullivan D, Sutton B, Swapp J, Thompson C, Turner K, Wakeman W, Whitesell JD, Williams D, Williford A, Young R, Zeng H, Naylor S, Phillips JW, Reid RC, Mihalas S, Olsen SR, Koch C. 2019. A survey of spiking activity reveals a functional hierarchy of mouse corticothalamic visual areas (preprint). Neuroscience. doi:10.1101/805010

      Steinmetz NA, Aydin C, Lebedeva A, Okun M, Pachitariu M, Bauza M, Beau M, Bhagat J, Böhm C, Broux M, Chen S, Colonell J, Gardner RJ, Karsh B, Kloosterman F, Kostadinov D, Mora-Lopez C, O’Callaghan J, Park J, Putzeys J, Sauerbrei B, van Daal RJJ, Vollan AZ, Wang S, Welkenhuysen M, Ye Z, Dudman JT, Dutta B, Hantman AW, Harris KD, Lee AK, Moser EI, O’Keefe J, Renart A, Svoboda K, Häusser M, Haesler S, Carandini M, Harris TD. 2021. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. Science 372:eabf4588. doi:10.1126/science.abf4588

      Taylor MM, Contreras D, Destexhe A, Frégnac Y, Antolik J. 2021. An Anatomically Constrained Model of V1 Simple Cells Predicts the Coexistence of Push–Pull and Broad Inhibition. J Neurosci 41:7797–7812. doi:10.1523/JNEUROSCI.0928-20.2021

      Usrey WM, Reppas JB, Reid RC. 1999. Specificity and Strength of Retinogeniculate Connections. Journal of Neurophysiology 82:3527–3540. doi:10.1152/jn.1999.82.6.3527

      Usrey WM, Reppas JB, Reid RC. 1998. Paired-spike interactions and synaptic efficacy of retinal inputs to the thalamus. Nature 395:384–387. doi:10.1038/26487

      Whyland KL, Slusarczyk AS, Bickford ME. 2020. GABAergic cell types in the superficial layers of the mouse superior colliculus. J Comp Neurol 528:308–320. doi:10.1002/cne.24754

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the Reviewers for their careful reading and the many thoughtful suggestions to improve our manuscript, as well as both the Editors and Reviewers for the generally positive evaluations and encouraging statements.

      Editorial assessment:

      This important work presents an interesting perspective for the generation and interpretation of phase precession in the hippocampal formation. Through numerical simula- tions and comparison to experiments, the study provides solid evidence for the role of the DG-CA3 loop in generating theta-time scale correlations and sequences, which would be reinforced through the clarification of the concepts introduced in the study, in particular the notion of intrinsic and extrinsic sequences. This study will be of interest for the hippocampus and neural coding fields.

      We appreciate that our work has been considered important. In our revision we made a considerable effort to improve on the presentation of our results and the justification of our model assumptions. Particularly we aimed to clarify the meaning of intrinsic and extrinsic sequences by ad- ditional figure panels as well as fleshing out their definition via spike-timing correlations being independent or dependent on the direction of the running trajectory, respectively. To address all the requests, we added 3 new Fig- ures, multiple new Figure panels and simulated a new model variant.

      Reviewer #1 in their public review assessed ”The manuscript has the potential to contribute to the way we interpret hippocampal temporal coding for navigation and memory.”

      They criticized

      • The findings generally relate to network models of phase precession (re- viewed in e.g., Maurer and McNaughton, 2007, Jaramillo and Kempter, 2017). An important drawback of these models with respect to explaining specific experimentally observed features of phase precession, is that they cannot straightforwardly explain phase precession upon first exposure onto a novel track. This is because, specific connectivity in network models may re- quire experience-dependent plasticity, which would not be possible upon first exposure. This is essential, given that the manuscript addresses the possible origin of phase precession in terms of network models and at minimum, this weakness should be discussed.

      We agree with Reviewer # 1 (and also with Reviewer # 2, who brought up a similar point) that models based on recurrence struggle to ex- plain how the recurrent connectivity matrix should come about. While we feel that a full model of how the 2-d topology in the recurrent weights can be learned goes far beyond the scope of this paper (and to our knowledge has not been solved so far in any existing model), we added a new model variant (new Figure 6 and Supplementary Figure 1), which explains the ba- sic phenomenology of extrinsic and intrinsic sequences without the need of recurrent connections, only using feed-forward synaptic facilitation. Thus, assuming recurrent connection is not necessary for our main findings. How- ever, we would like to point out that this does not exclude the possibility that recurrent connections, if set up in an appropriate way, also contribute to phase precession and theta sequences.

      • An important and perhaps essential component of the manuscript, is the distinction between extrinsic and intrinsic models. However, the main con- cepts on which this hinges, namely extrinsic and intrinsic sequences (and the related extrinsicity and intrinsicity) could be better explained and illustrated. Along these lines, the result suggested by the title, namely, hippocampal theta correlations, may be important yet incidental in light of the new concepts (e.g., extrinsicity, intrinsicity) and computational models (e.g., DG-CA3 recurrent loop) that are put forward.

      We have added substantial new explanatory material to the figures, captions and text to more didactically introduce the concepts of in- trinsicity and extrinsicity. We have also completely rewritten the abstract and added a subtitle: ”extrinsic and intrinsic sequences”

      • The study seems to put forward novel computational ideas related to neural coding. However, assessing novelty is challenging as this manuscript builds on previous work from the authors, including published (Leibold, 2020, Yiu et al., 2022) and unpublished (Ahmadi et al., 2022. bioRxiv) work. For example, the interpretation of intrinsic sequences in terms of landmarks had been introduced in Leibold, 2020.

      We agree with the reviewer that this paper touches on many related ideas from previous papers (not only of our lab) and is supposed to tie loose ends. Thus, the novel contribution is a biologically plausible mechanistic model of how intrinsic sequences and 2-d place maps interact on the level of interconnected spiking neurons. Such a level of explanation has not yet been available in previous work. We have considerably extended the Discussion section in our revision detailing the bigger picture underlying this theory. Also our addition of the non-recurrent model variant (see above) adds considerable novelty, since it provides an account of phase precession and preplay in novel environments.

      • The significance of the readout tempotron neuron could be expanded on. In particular, there is room for interpretation of the output signal of that neuron (e.g., what is the significance of other neurons downstream? Why is the rationale for this output to being theta-modulated?)

      We have added an additional Figure 8 to better illustrate the inner workings of the tempotron. We also extended the discussion to better explain the potential use of the tempotron output (see above). In short, we consider the tempotron to signal a unique behaviorally important context that is independent of remapping induced by changes of sensory cues, which is a new prediction of the model. Since the context signal is resulting from DG loops it requires a stable code to also exits in the DG. Evidence for such long-term stability in DG has been found in Hainmu¨ller & Bartos (2018).

      Reviewer #2 in their public review find ”this research topic to be both important and interesting” and appreciates ”the clarity of the paper.”, com- mending our ”efforts to integrate previous theories into their model and con- duct a systematic comparison”.

      We are very happy about these positive remarks and sincerely would like to thank the reviewer!

      Reviewer #1 made the following specific recommendations for changes:

      The abstract is somewhat difficult to parse. I have identified some words and/or sections that could be improved.

      • ’ ....inherently 1 dimensional’. This statement seems to be related to an a priori interpretation of the authors. On the other hand, if offline sequences are trivially 1 dimensional because they are sequences (i.e., they constitute a vector), then online sequences would be 1-dimensional as well. What is the key difference between offline and online? Is it the omnidirectional place fields in two dimensions? Perhaps more importantly, how relevant is this fact with respect to the main results of the manuscript, which concern ex- trinsic and intrinsic sequences?

      We indeed meant that the sequences are trivially 1-dimensional. The main challenge that we would like to address in this paper is how a 2-d topology of place cells (and direction dependent theta sequences) and a 1-d sequence topology of intrinsic theta correlations and during (p)replay can be reconciled. We hope this has become clearer in the rewritten abstract.

      • The language in lines 36-38 is overly technical. I suggest modifying the language, the language was less technical and more understandable in the body of the manuscript, which should be also reflected in the Abstract.

      We would would like to apologize for making the abstract too technical. Also in response to Reviewer #2, we decided to rewrite the ab- stract entirely.

      The authors use a mixture of conductance based models and Izhikevich neurons, presumably for the spiking generating mechanism. The conductance component can be readily interpreted in terms of the underlying biophysics. The Izhikhevich neuron model, however, is phenomenological. I suggest you address i) the rationale for using Izhikevich model, 2) its biophysical inter- pretation, 3) and its combination with conductance-based currents.

      The reviewer is correct that spike generation is modelled using Izhikevich’s model whereas synaptic integration is included in a conductance- based manner. As suggested by the reviewer, we have added further expla- nation in the Methods part, explaining that the Izhikevich approach allows to adjust burst firing properties with only few parameters by efficiently em- ulating the bifurcation structure of spike generation in the full biophysical model (1&2) and otherwise has no effect on the integration of conductance- based synaptic currents in a subthreshold regime (3).

      Line 126: when you say preferred angle, do you mean preferred (heading) direction? If so, please maintain consistency throughout.

      We thank the reviewer for pointing out the inconsistency. We have added the word ”heading” throughout the manuscript whenever ap- propriate. To further improve the consistency, we have clarified the meanings of ”best” (or ”worst”) direction and reserved the use of it solely for cases when trajectory direction is compared with the preferred heading direction, namely, ”best” (”worst”) direction when trajectory is along (opposite) the preferred heading direction.

      Line 174: When discussing cross-correlation, sometimes you mean a cross-correlation function between two place fields and sometimes to the his- togram of all such correlations? Please clarify.

      We used histograms to empirically estimate the underlying cross-correlation function. For clarity, we have specified that it is a cross- correlation histogram in the revised manuscript whenever we refer to the empirical estimate.

      Figure 3:

      Understanding the difference between extrinsic and intrinsic sequences is fundamental for the manuscript. I suggest that in the section that refers to Figure 3 (or Figure 3 itself), you kindly provide an example depicting how extrinsic and intrinsic sequences can

      1) coexist yet be distinctly identified

      2) depend on trajectory

      3) depend on DG input

      By coexistence, we meant the heterogeneous population of ex- trinsic and intrinsic cell pairs and, hence, the extrinsic and intrinsic theta correlations, as shown in Figure 3J. To improve the clarity, we added the following sentence in the section that refers to Figure 3: ”In our simula- tion, extrinsically and intrinsically driven cell pairs are both present in the population (Figure 3J), indicating a coexistence of extrinsic and intrinsic sequences.”. To illustrate how extrinsic and intrinsic sequences depend on both tra- jectory and DG recurrence, we have also added annotations in Figure 3F to mark the extrinsic and intrinsic part of the sequence.

      Moreover, the caption of Figure 3 refers to the directionality of the theta sequences. How does this again relate to the extrinsic/intrinsic distinction?

      We hope the highlighting in panel F of Figure 3 has resolved this problem.

      Figure 5:

      • This is a crucial figure that should illustrate the differences between extrinsic and intrinsic sequences, as the figure caption suggests. Surprisingly, it is not at all clear where (i.e., in which panel) and how (i.e., methodologi- cally) should one distinguish one type of sequence from another. I suggest that at least one such panel is dedicated to illustrating the difference and/or detection of these sequences in time and/or from phase precession plots. Moreover, there is significant visual crowding that makes the interpretation challenging (e.g., insert a space between G and E)

      We would like to apologize that in the previous version of the manuscript, we seemed to have evoked the impression that the difference between intrinsic and extrinsic sequences should be mainly illustrated in Figure 5. We hope that our revisions of Figures 1 and 3 have made it sufficiently clear to this point. The main purpose of Figure 5 was (and is) to illustrate how intrinsic sequences can lead to out-of-field firing. We have modified the figure caption (and text) accordingly. To address the visual crowding problem in Figure 5, we have inserted a space between panels and also removed repeated labels.

      Tempotron neuron and Figure 6:

      From the reviewer’s questions on Figure 6, we feel that our presentation caused considerable confusion about the motivation and inter- pretation of the tempotron simulations. We therefore rewrote parts of the associated text and Figure caption. We hope that the revised presentation clarifies the issues. We therefore only briefly respond to the reviewer’s points here, because we think they largely resulted from misunderstandings.

      • Intuitively, and as the manuscript results suggest, late phases are asso- ciated to extrinsic mechanisms while early phases are associated to intrinsic. Why not construct a simpler classifier readout based on this fact? How does it compare to a tempotron?

      Opposite to the reviewer’s comment, extrinsic mechanisms are visible at early phases (late in the field), intrinsic mechanisms at late phases (early in the field). In fact, what the tempotron does is learning to identify the intrinsic (late phase) part and to disregard the extrinsic (early phase) part.

      • What is the significance of theta-modulated output of the tempotron (readout) neuron?

      The theta modulation of the tempotron output is a trivial re- sult of the theta-modulation of the input, i.e., the detection of the intrinsic sequence pattern is done once every cycle.

      Suggestion for Figure 6 related to Tempotron readout: Focus on ’with DG loop condition’, as the challenge and most important point here is to identify extrinsic and intrinsic sequences. The No-loop condition could be left as a supplementary figure or side panel.

      The no-loop condition is the essential control showing that the tempotron only responds to the previously learned intrinsic pattern and can- not identify spatial location based on the extrinsic pattern.

      Further work/predictions.

      Lines 196-198. ”Since intrinsic sequences can also propagate outside the trajectory (Figure 5) and activate place cells non-locally, our model predicts direction-dependent expansion of place fields.” If remote activation is ’suffi- ciently’ remote, wouldn’t this predict two separate place fields instead of an expansion?

      The reviewer is completely correct. Out of field spiking can be also affecting remote locations, if the intrinsic sequences link to remote place fields. This would lead to double fields, however, the intrinsic part would only be active at late theta phases. For simplicity, we have not added such a case in our paper, but we would like to thank the reviewer for this comment, since it leads to a nice prediction of the model, which can be experimentally tested and therefore was included to the discussion.

      Lines 556-558. ”In our model, firing rate is determined by both low-phase spiking from sensory input and high-phase spike arrivals of DG-CA3 loops, both producing opposing effects on the phase distribution.” Is it possible to make a differential prediction based on lesions here, e.g., along the lines of reduced range phase precession, for either high phases or for low phases?

      We thank the reviewer for this great suggestion. Lesion of DG in the model does indeed reduce the phase range and mean spike phase. This further corroborates the effect of DG-loop on theta compression and high-phase spiking. We have included a new panel D in Figure 4 and a corresponding mention in the result section.

      Line 570. ”We speculate that the functional roles of intrinsic sequences may not be limited to spatial memories.”. Is there any relationship to re- play and/or sleep-dependent memory consolidation? Some speculation in the Discussion section would be welcome and appropriate.

      We have added some further speculative ideas to the last section of the Discussion. We propose that replay and preplay reflects the intrinsic sequences that express the current expectation of the animal. We have not yet thought well enough about their relation to memory consolidation to phrase this in the manuscript, but would suggest that they could serve to signal multimodal context information to the neocortex where it can evoke retrieval of unimodal memory traces.

      The description of the results, as stated in the public review, can be im- proved. A key component is the definition and identification of extrinsic and intrinsic sequences.

      Some comments:

      • I think that the words ’extrinsic’ and ’intrinsic’ are problematic as both types of sequences/models rely on external (spatial) input, hence both are in some sense ’extrinsic’. On the other hand, both are network mechanisms, thus in some sense ’intrinsic’, where the asymmetry is either programmed directly onto the weights or due to synaptic depression. To add to the con- fusion, ’intrinsic’ mechanisms very often refer to cellular mechanisms in neurophysiology. I kindly ask you to, ideally, reconsider the terminology, or at the very least, be very thorough and precise when describing the mech- anisms. For example, sometimes extrinsic (intrinsic) ’models’ are referred to, sometimes ’sequences’, sometimes ’factors’, sometimes ’pairs’, etc.

      We understand and appreciate the reviewers argument, but would like to stick to the terminology, since it was already used in our prior publication. We have made considerable effort to improve the explanation and illustration of extrinsic vs. intrinsic pairs in the main text, Figure 1 and 3 to highlight our definition that is based on pair correlations: Extrin- sic pairs flip the correlation lag with reversal of running direction, intrinsic pairs don’t. This is simply a functional definition and should not be con- fused with potential microscopic mechanisms. One of those (DG-loops) is suggested in our paper.

      • As discussed in the public review, network mechanisms may require experience-dependent plasticity and hence cannot easily explain phase pre- cession on the first pass. Please discuss why and/or how your model fits with this observation.

      We agree that the two models under consideration both require the recurrent network be set up appropriately and there is no theory so far that would explain how. The reason we chose these two models is because they are well known in the community and relatively similar. We reasoned that comparison between an intrinsic model and an extrinsic model would make most sense if the two are a similar as possible. Nevertheless, we ex- tended the manuscript by a new set of simulations in which we do not use re- current CA3 connections and obtain phase precession solely be feed-forward synaptic facilitation (new Figure 6 and supplementary Figure S1). The new simulations show that the basic phenomenology can also be obtained with- out using recurrent CA3 connections, however, as expected when removing one mechanisms of phase precession, the range of phase range is somewhat reduced as compared to the full model.

      Along a similar vein, phase precession in Figure 1E only has a range of pi/2, which is about half of the typical range of phase precession for single runs. This should be characterized as a weakness of the intrinsic model.

      The precession range in spiking models is highly sensitive to a large number of parameters such that it is hard to make such definite claims (see also above response). In the original Tsodyks et al. 1996 paper the phase range went up to 270 degrees with a slightly different implementation to ours in terms of current vs. conductance-based synapses, an exponen- tial instead of a Gaussian recurrent weight function, and 1-d (original) vs 2-d (ours). We chose conductance-based synapses, and a Gaussian weight profile for better comparison with the Romani and Tsodyks (2015) model. In the original non-spiking implementation by Romani and Tsodyks (2015), the phase range was hardly 70 degrees. Our model implementation of the Romani and Tsodyks (2015) model fits the experimentally reported phase ranges of about 70 to 180 degrees in CA3 (Harris et al., 2001).

      Lines 282-284: ”...since phase precession properties change in relation to running directions, nor are they solely intrinsic since reversal of correlation is still observed in most of the sequences (Huxter et al., 2008; Yiu et al., 2022).”. To which extent is this a consequence of the phase precession model (extrinsic vs intrinsic) or the fact that place fields are sometimes directional?

      The reversal of sequences with reversed running direction is how we define extrinsic correlation. We hope our changes in relation to Figure 1 has clarified this point.

      Figure 2: Is it i) directional input or ii) short-term facilitation that gives rise to lower phase? (or perhaps both?) Please clarify.

      It’s both. This is now clarified in the revised version of the Re- sults sections related to Figure 2: higher depolarization always yields earlier phases in spiking models, however, pair correlations are not affected by ei- ther of the two mechanisms.

      Line 320. ”...onset of phase precession”. Do you mean in CA3/CA1/DG?

      Thank you for pointing this out. We have clarified that this statement refers to CA3.

      Line 323. ”....at a different location”. Please add rationale why it has to be at a different location and a reference to the appropriate equation.

      The sequence rationale as well as the equation number have been added.

      Line 384. ” ... predicting that loss of DG inputs is compensated for by the increase of release probability in the spared afferent synapses from the MEC.”. It wasn’t clear whether this was a ’homeostasis prediction’, or and implementation in the model. Please clarify.

      Since the model explained the experimental observations by implementing an increased probability of release, the model predicts that in animals with DG lesion the probability of release should be enhanced. We have modified the wording to avoid confusion.

      Line 428 ”...and near future locations) is obvious, the potential role of the lesser expressed intrinsic sequence contributions is not straightforward.”. Similar to my comments above regarding terminology, please clarify what are both contributions and why are intrinsic sequences ’lesser expressed’.

      We have rewritten this passage to avoid unclear wording.

      Line 474. ”...we showed that the trajectory-independent sequences”. Do you mean ’intrinsic sequences’?

      We thank the reviewer for careful reading! We have changed the wording ”intrinsic sequences” in the revision.

      Line 482. ”...field pairs being extrinsic”. Please clarify, as the usage of extrinsic now refers to field pairs.

      Thank you for pointing this out. We went through the whole manuscript and clarified the terms.

      Line 245 (heading). Consider rewriting as ’Dependence of theta se- quences on heading directions’. Extrinsic and Intrinsic models have not yet been introduced.

      Since the main purpose of the first Results section is to explain the difference between extrinsic and intrinsic sequences we kept these terms in the heading but modified it to ”Dependence of theta sequences on head- ing directions: Extrinsic and intrinsic sequences”. Additionally, we have put more emphasis on introducing the terms ”extrinsic” and ”intrinsic” in this section.

      Figure 1.

      • I suggest using the same font - C and D, and F and G are too close to each other, consider adding space. For example, the exponent, 10-2 makes reading cumbersome. Line 300. Phase tail means offset phase? Phase tail may be too informal. Line 325: DG loop. Do you mean CA3-DG projection?

      We thank the reviewer for the suggestions. In the revised manuscript, we have ensured that the same font is used in all of the fig- ures. To improve the readability of Figure 1, we have added space between panels as suggested, removed repeated axis label and downsized the text ”10-2”. Furthermore, we have rewritten the referenced line without using the word ”tail”, and also, clarified the meaning of DG loop as the short form of CA3-DG projection.

      Figure 4 caption: ”DG lesion reduces temporal correlations...”. It is more precise to say that the lesion reduces the slope of the fitted lag vs dis- tance. And how is this related to sequence compression?

      In the paragraph referring to Figure 4, we have elaborated on the meaning of theta compression and its relation with the the lag-distance plot. However, we argue that ”reduces the slope of the fitted curve” is not comprehensive enough to express our summarized conclusion in a caption title. We have modified the wording to be ”DG lesion reduces theta compression”.

      In addition, we have changed the slope unit to be radians per cm rather than radians per maximum pair distance, in conformity to unit standards.

      General comment about terminology with regards to tuning and connec- tivity: it is not formally correct to compare connectivity with trajectories (e.g., lines 388-395, caption of Figure 5A, etc). Perhaps compare tuning to particular directions/preference or receptive field?

      We have corrected the wording such that the direction of DG- loop projection is compared to the direction of trajectory.

      Line 470. ’...fixed recursive loop.” Sentence is not clear, do you mean recurrent loops?

      The reviewer is correct. We corrected the wording

      Reviewer #2 had the following recommendations.

      M1. The abstract focuses on the differences between online and offline hippocampal replays. However, the replay topic is not touched upon in the rest of the manuscript. I found this very confusing when I first read the pa- per. I suggest the authors reconsider the best way to approach the opening or at least discuss if and how their model would incorporate replay phenomena.

      Also in response to reviewer #1 we have rewritten the abstract focusing on the problem of how to generate 2-d topology from 1-d sequences. In addition, also in response to Reviewer#1 we added a paragraph in the discussion detailing a hypothesis on how er think replay and intrinsic se- quences work together.

      m2. On lines 89-91, the authors provide the selection of neuronal pa- rameters for excitatory pyramidal cells and inhibitory cells in the Izhikevich model. While the choice of model is reasonable, it would be helpful to clarify the source of these neuronal parameters, especially for readers who are not familiar with the model.

      Again, also in response to reviewer # 1, we have added more motivation for the Izhikevich model.

      M3. On lines 94-98, the model considers a 2D sheet of CA3 neurons. One of the most significant assumptions is that each 2x2 tile of place cells is considered a unit with four directional angles. What is the basis for this assumption? Is there any experimental result supporting this, or is it a completely artificial design for the model? This is important since the or- ganization of CA3 cells also affects the network architecture discussed later and impacts the realism of the model.

      This comment is related to Reviewer #1’s concern on experience- dependent plasticity: How is this connectivity pattern established? We fully agree that this is an open problem for the Tsodyks et al.-type networks. The main reason for choosing them (as argued in our response to reviewer #1) is to have two published models, representing one type of sequence each, that are similar enough for comparison. In addition, we added new simulations (new Figure 6 and Supplementary Figure S1), showing that the basic phe- nomenology can also be obtained in a model without recurrent connections (see also response to Reviewer # 1)

      m4. Similarly, on lines 111 and 140, the model uses 500 ms for the timescales of short facilitation and short-term synaptic depression. The choices of these two timescales are vital for producing directionality in extrin- sic and intrinsic sequences, yet their experimental sources are not clarified.

      In the Methods section of the revised manuscript, we have in- cluded the sources of previous experimental data and modelling work to support our choice of the time constants.

      M5. On line 126, the authors assume that the synaptic strengths be- tween CA3 cells, Wij, are given by the distances between neurons and the similarity between their directional preferences. While this assumption seems reasonable in the sensory cortex, I am unsure if this is also the case in the hippocampus, and the authors should clarify the basis for this assumption.

      The distance dependence simply reflects the original Romani and Tsodyks 2015 model (see response to M3) and we share the concern of the reviewers. The increased connectivity for neurons with the same di- rectional preference was necessary to recover the direction dependent phase precession properties (Figure 2) in the realm of the Romani and Tsodyks 2015 model. Please also see our new Figure 6 showing simulations without the recurrent matrix.

      More importantly, the existing connections within CA3 and DG cells completely determine the ”intrinsic” sequences. But wouldn’t this be fragile when place cells undergo global remapping, which can take place within only a few seconds? The author should comment on this in the discussion.

      We would like to thank the reviewer for bringing up this inter- esting point. In our thinking, the DG-CA3 connectivity is fixed (multiple 1-d trajectories, not necessarily requiring 2-d topology), i.e., the same in- trinsic sequence should show up in multiple environments (and should not remap), although it may just not be active in some environments). This is a prediction of our model and we have added it to the Discussion.

      M6. I found the setup of DG place cells unreasonable. DG place cells are found to be granule cells rather than pyramidal cells. Moreover, the model does not consider recurrent connections between DG cells (These setups are closer to CA1 place cells).

      We agree with the reviewer, DG granule cells should rather be modelled as high-input resistance EIF neurons. However, the feedback loop via the dentate is not a direct one. It involves hilar mossy cells plus multiple hierarchies of feedback inhibition (this is probably what the reviewer means with recurrent connections between DG neurons, because granule cells are not recurrently connected in the non-pathological state). To our knowledge a biologically realistic model of the hilar-DG network does not exist and it would be far beyond the scope of this paper to develop one. We therefore see our DG feedback model rather as phenomenological. The discussion paragraph on the anatomy of the dentate gyrus touches on these points.

      Therefore, a significant concern is: Why should it be the DG feedback projection to CA3 responsible for the ”intrinsic” sequences instead of pro- jections from other brain areas?

      The reviewer is generally correct, any brain structure which im- plements fixed sequences via a loop would do. The reason why we suggest the DG to be the best candidate is purely empirical referring to papers with dentate lesions: Sasaki et al. 2018 and Ahmadi et a. 2022. We have added a similar argument to the discussion.

      m7. On line 166, the authors claim that there are no connections between inhibitory cells at all. While I understand that this is for simplification of the model, the lack of recurrent inhibition between interneurons may have limited the model’s ability to produce gamma-band dynamics (referring to PING and ING mechanisms), which are robust rhythms produced in CA3. I am very curious if the model can incorporate theta-gamma coupling by in- troducing connections between CA3 inhibitory cells.

      We have omitted the gamma oscillation for simplicity, because we do not have a hypothesis for a functional role in the context of dis- tinguishing extrinsic from intrinsic sequences (Occam’s razor) and, as the reviewer correctly anticipates, they unavoidably show up when inhibitory in- terneurons connect to each other (e.g. Thurley et al. 2013). Of course, one could envision situations in which gamma for intrinsic sequences my have different frequency than for extrinsic ones, by differentially manipulating the CA3 and DG basket cell networks, but, as long as there is no experimental data, it would be pure speculation and thus we have not included it in the model.

      m8. The authors should clarify the source of parameters in Table 1, especially the synaptic strengths. These values are vital for extrinsic and intrinsic theta sequences.

      The weight values have been chosen to allow for large theta phase precession range, coexistence of extrinsic and intrinsic sequences, and stability of the network activity. A similar statement has been added to the manuscript.

      M9. I have another concern regarding the measurements of ”extrinsic- ity” and ”intrinsicity” defined on lines 185-196. Are they the best measures? To distinguish the cause of spike correlations, the ”extrinsicity” and ”intrin- sicity” of a pair of spikes should not be high at the same time. However, this is clearly not the case in the model, according to Figs 3 and 5. Moreover, in the data analysis carried out later, spike pairs are considered extrinsic or intrinsic merely by comparing the two measurements. I suggest the authors consider counterfactual methods in causal inference. For example, would a spike pair (cell1, cell2) still exist if we change the sensorimotor inputs or the DG-CA3 projections? If this is difficult to implement, the authors should at least discuss how different choices of measurements would impact the con- clusions of the paper.

      The problem the reviewer has identified arises from the funda- mental symmetry of theta phase quantification: if spikes of a pair of place fields have a phase difference of 180◦ one cannot say which cell leads and which cell follows, hence, the phase difference is both intrinsic (because the peak doesn’t flip) and extrinsic (because the peak flips and ends up at the same phase). The fact that in some cases extrinsicity as well as intrinsicity are high simply means that the field pair has a correlation peak lag close to 180◦. Since in the experimental data set in (Yiu et al. 2022) only field pairs were available, we have not been able to use a different quantification then and decided to apply the same quantification in our model for comparison. Moreover, Figure 5F nicely shows that the measures are able to retrieve the ground-truth intrinsic DG-loop structure when considered on the population level.

      In our model, though, we can go beyond 2-nd order statistics and derive sequence similarity measures including multiple cells, e.g., Chenani et al. 2019. However, since, we already know the ground truth by construction, we decided to not use these methods. We added a paragraph in the discus- sion elaborating on beyond 2nd order sequence quantification.

      m10. The authors begin discussing ”intrinsic sequences” from line 316. However, it is not defined before that (and in the rest of the paper as well), causing confusion when reading the paper. The exact definitions of extrinsic and intrinsic sequences should come earlier.

      We hope that our changes to the beginning of the results section (Figure 1), also asked for by Reviewer # 1 could clarify the confusion.

      m11. On lines 345-347, the authors claim that ”the intrinsic sequences are played out backward as determined by the direction of fixed recurrence (Figure 3F),” which is vague. If such sequences are present in that panel, it should be more explicitly indicated graphically.

      Also in response to Reviewer #1, we have graphically high- lighted the two types of sequences.

      M12. On lines 309, 356, 484, 495, 515, and possibly other instances, the authors repeatedly claim that the model simulations are in ”quantitative agreement” with their previous experimental paper. However, no experimen- tal data or comparison with the simulations are presented in this paper. The authors should at least create one figure to demonstrate the degree of consistency between them, instead of merely asking the reader to refer back to their previous paper.

      We agree with the reviewer that the experimental data of our previous paper should be presented in the manuscript. However, creating more panels or figures is likely to clutter the already crowded visuals and ob- scure our main message. We therefore decided to give numerical comparisons the previous findings in the main text whenever appropriate, specifically, in the sections referring to Figures 2, 3 and in the Discussion.

    1. Author Response

      We thank Dr. Carlos Isales and Dr. Jenny Tung as well as the peer Reviewers for their critiques and comments concerning this manuscript and respond here to their key concerns. Some of the Reviewers’ questions raised fascinating points about naked mole-rat biology and social habits, which we are also curious about, but which are too far afield from the central themes of the manuscript to warrant new work or revision. The Reviewers also raised some concerns about our methodological assessments and data interpretation which may warrant further discussion and explanation. We address those comments below. In no case do we feel that the concerns raised undermine our conclusions, so we have not undertaken new analyses nor revised the manuscript.

      Median survival and power.

      A recurring theme in these reviews is that our conclusion that naked mole-rats do not experience actuarial senescence is spurious, as it is “incomplete for younger animals and inadequate for older animals” due to Kaplan-Meier survival failing to reach median lifespan. We counter that premise, for median survival is an arbitrary threshold with no special bearing on when the Gompertzian hazard increase (onset of actuarial senescence) should become apparent. This point is well illustrated in Figure 5 of our original manuscript (Ruby et al., 2018). For demographic data from lab mice, humans, and horses (panels B, C, and D, respectively), the Gompertzian hazard increase is readily apparent by the time median survival (indicated by vertical dotted lines) is reached.

      Another concern raised in the reviews is uncertainty about the true increase in power for these updated data since our 2018 report. The Reviewers correctly point out that the distribution of those data, and not just their scale, are relevant to power. The distribution of all data, old and new, are clearly illustrated as a function of age in Figure 2A. The ~doubling of available observation data is consistent across age groups, with one exception: at ~8,000-10,000 days of age. However, we do not agree that is a shortcoming of the new data’s power for hazard calculation among older animals, given that the animals that formerly occupied that age bin have continued to age, without greater hazard, across the next five years. In other words, the lack of N increase in that particular age bin is balanced by the massive increase in available data at ~10,000-12,000 days of age - an advanced age bin that was previously almost empty.

      More surprisingly is the insinuation that for an approximately 40 gram rodent species, median survival on an order of 30+ years, with no sign of an increase in age-related mortality hazard, is considered a reasonable expectation. Both here and in our 2018 manuscript, we have conservatively used Tsex (180 days) as our benchmark for allometric scaling. Alternatively, one could scale this to the predicted lifespan based on average body weight for the species. According to the equation of de Magalhaes et al. (2007), the maximum lifespan of H.glaber is expected to be merely six years. Here, the Reviewers suggest that we are under-powered to make any statements about demographic aging because we have not reached median lifespan - despite the fact that our observations extend out to seven times the expected maximum lifespan. This is the precise nature of our argument that Gompertzian demographic aging is defied: that the onset of actuarial senescence is not apparent even at ages many-fold beyond when one would expect Gompertzian trends to have wiped out the entire population.

      Ironically, the Reviewers seem to have focused on the most striking manifestations of Gompertzian defiance - not reaching median lifespan after decades of population observation, or having few death events after tens of thousands of days of individual lifespan observation - as reasons to doubt the conclusions. Even if we quadrupled the number of sample points and included data for another 35 years, if we still did not detect the onset of actuarial senescence, the same critiques would still apply - and would be similarly illogical.

      The appropriateness of Kaplan-Meier, with left & right censorship

      Objections were raised about the appropriateness of Kaplan-Meier survival analysis for our data. Reviewer #3 asserts that “a Kaplan-Meier estimator can only take right-censored and uncensored records”, which is incorrect. This perhaps reflects a wider misunderstanding of Kaplan-Meier statistics that warrants further explanation.

      Reviewer #3 asserts that “left-censoring occurs when your event can be repeated and some events occur before the start of the study”. This is an oversimplified and far too-limited description of when left-censoring should be applied. We will further explain how left-censorship is applied in various analyses of our data, but for further reading on how this practice can produce unbiased estimates, we recommend the Reviewers consult (Cain et al 2011). We will discuss left and right truncation and censorship in terms of the diagram from Figure 2 of that manuscript, which illustrates a study in which the timing of event Y after event X in an individual’s life is being analyzed, given enrollment in a study at age A and exit from the study at age B. We also remind the Reviewers that methods used previously by us are in the papers (Ruby et al, 2018 & 2019) which were referenced and cited in our manuscript and should also be consulted for a full description.

      For our study, ages A and B from (Cain et al 2011) are akin to the edges of our hazard estimation windows: appropriate application of censorship and truncation allows us to accurately, unbiasedly estimate hazard within each age bin, allowing fair evaluation of changes (or lack thereof) as a function of age. For full Kaplan-Meier survival, age A is uniformly defined as Tsex (day 187), and B is not globally defined - rather, it is defined for each animal if observation ended due to exit from the collection (i.e., used in research studies (KFR), donated to another researcher, or continuing to be alive at the time of the study). Since none of the Reviewers seemed confused or concerned about our use of right-censorship in these cases, we will focus this discussion on left-censorship.

      In our original analysis (Ruby et al., 2018), we did not apply left-censorship because Dr. Buffenstein had maintained the animals since they were born, therefore no events occurred (i.e. observations of an animal being alive or dead on a day) prior to the beginning of the study. In the parlance of (Cain et al, 2011): we knew when the initiating event X had occurred (Tsex), and the animals had been continuously observed thereafter, up until either their death or rightcensorship point. Animals were right-censored if they were removed from the study, e.g. due to sacrifice for research or donation to other researchers. Doing so reduced the population size moving forward (to the right) without modifying the survival value, allowing the impact of individual death events to be appropriately amplified (i.e. Kaplan-Meier analysis).

      For left-censored data, the same operation occurs but in reverse order: for example, if an animal is left-censored at 457 days of age, then the population size is increased by one on that day, without modifying the survival value. In Kaplan-Meier survival estimation, for each observation period, the current survival value is multiplied by the fraction of animals surviving at that time interval divided by the number of animals in the population in that interval. Since the animal in question was not observed prior to 457 days of age, it would not be counted in the population size prior to that day: had it died, it would not have been in the study population at all. However, once it has entered the population, each day-of-age on which it is observed to be alive is included in the population size tally, since each day it could also perish and thereby impact the survival curve. If any of the Reviewers received animals from Dr. Buffenstein should they wish to extend this data set in the future using those animals, left-censoring them at their age when they were received (or after some acclimation period) would be the proper method to do so.

      As stated above: in our original analysis (Ruby et al., 2018), we did not generally apply leftcensorship because Dr. Buffenstein had maintained the animals since they were born (although beginning the analysis at Tsex qualifies as population-wide left-censorship). In their commentary, Dammann et al. (2019) pointed out that loss of records could modify the hazard distribution through bias towards longer-term survivors: in other words, counting long-lived animals as part of the population in early life is unfair because the death events from the truly larger population at that time had been lost (in that case: perhaps back in the 1980’s). In the parlance of (Cain et al, 2011): loss of records would have been the equivalent of left truncation, which if unchecked could produce bias. For our reply (Ruby et al., 2019), we address this problem by applying a drastic left-censoring of all animal data on a date where we could be highly confident that all records had been securely maintained, thus removing any potential bias introduced by old, lost records - as illustrated by (Cain et al, 2011). That re-analysis does not change our results, negating loss of decades-old records as a confounder of our conclusions. In this new manuscript, we used this technique again, only analyzing data collected since those data reported in our prior publications. Again, our original conclusions were confirmed: quoting Reviewer #3, “the main figures are virtually the same, with some minor changes due to the extended dataset”.

      Independence between studies

      In this new manuscript, with substantially more data, we applied left-censorship again in order to conduct an analysis of just the newly-provided data. Importantly, no datum - i.e. no day of observation of an animal being either alive or dead - overlapped between that analysis and those from our original reports (Ruby et al., 2018 & 19), and data were collected across nonoverlapping periods of time. Reviewer #2 questions the independence of this analysis from the original, correctly citing that it is still our own collection whose demographic data we are surveying. We reply that it is as independent of a dataset as we could possibly provide: greater independence would require the publication of substantial demographic data from other members of the H.glaber research community, which we would be happy to see. We also want to remind the Reviewers that Sherman and Jarvis (2002) also reported negligible demographic senescence for animals >15 years of age under their care: a fully-independent observation that concurs with our conclusions, albeit with substantially fewer animals and less statistical power.

      “Glossing over” reports of aging phenotypes

      Reviewer #1 suggests that our review of our own prior publications in this manuscript has “glossed over data that don’t support our main interpretations”, specifically mentioning the papers by Edrey et al., (2011) and Andziak et al., (2006). However, this is not an accurate reflection of the content of those published papers. The reviewer highlights data pertaining to case studies of two animals, aged 29 and 30 years, exhibiting pathologies that are commonly associated with aging in the Edrey et al., (2011) paper that was entitled “Successful aging and sustained good health in the naked mole-rat……”. But, as per the title of that paper, those were atypical cases. Indeed, we reported that the majority of animals maintained good health and activity well into their third decade. The Andziak et al., (2006) paper revealed that young (2y), healthy naked mole-rats have higher levels of oxidative damage to lipids, proteins and DNA than observed in young mice; but the follow up paper Andziak and Buffenstein (2006) reported that unlike that observed in mice, in naked mole-rats the levels of such damage do not further increase with advancing age, supporting the premise of sustained tissue homeostasis. Routine pathological assessments undertaken by our group and from zoological specimens in the 12 years since Edrey et al., (2011) have revealed many more instances of “aging phenotype pathologies” - but again, with similar frequency across all age groups (Delaney et al., 2021). We have not “glossed over data that don’t support our main interpretations”: in fact, the data brought up by the Reviewer support our conclusions. Like natural death, “age-associated disease phenotypes” occur stochastically across all age groups of H.glaber, rather than being exponentially enriched in elderly animals as in other species.

      Breeding status

      Reviewer #1 also states that “this study fails to fully represent the literature with respect to the divergence in aging rates between breeders and non-breeders” This section of our discussion (lines 326-367) addresses the survival advantage in many cooperative breeding mammals in the wild and in captivity including other mole-rats and meerkats (Sharp and Clutton-Brock, 2010; Dammann et al., 2011, Cram et al., 2018). The lower survival of subordinates in captivity may be due to chronic stress associated with bullying by the dominant animals and their inability to disperse and avoid such unpleasant activities; often being injured and dying after losing fights for a more dominant position in the social hierarchy. Braude et al., (2021) similarly report that compared to subordinates who undertake the more precarious activities of burrow extension, foraging or dispersal, the breeding females remain in their study site for far longer periods.

      In captivity, subordinates have two paths to becoming a breeder: If the breeding female dies, some subordinate females within the colony will fight to the death to establish breeding status and inherit the dominant role in the colony. This could imply that they are “higher-quality” individuals as suggested by Reviewer #1 with molecular and physiological mechanisms in place to outlive their “poorer- quality” conspecifics. However, the majority of breeding females in our colony arise through random pairing of a female and a male that has been isolated for a few days from their colony. As such there is no selection for “higher-quality” individuals with concomitant inheritance of better somatic maintenance mechanisms. Rather, breeding status appears to be accompanied by a phenoplastic switch, as suggested by the lower levels of DNA methylation in tissues of breeding females (Horvath et al., 2021) and altered growth patterns when a female changes her status to that of a breeder (O’Riain et al., 2000). This is possibly linked to moving up the dominance hierarchy with concomitant changes in stress, somatotropic, and reproductive hormones as well as augmented tissue repair pathways for the maintenance of homeostasis.

      We have not undertaken in depth studies on behavior and social habits and the effect of age, but agree these would be of interest in future studies.

      Analysis initiation at 6 months

      Mortality rates are highest in the first three months of life, in keeping with increased mortality during the developmental period. While it is true that in captivity most animals continue to grow for the first eighteen months to two years of life and some individuals may continue to gain weight well into their third decade, we and others have shown that animals can successfully breed at 6 months of age, if given the opportunity to do so. Other demographic studies similarly use the age at which animals can reproduce as the starting point for their analyses. Nevertheless, even if we were to use 2 years as the starting point, the same trends will be evident for there was no increase in mortality risk even at ages beyond 30 years.

      Colony size effects

      It is intriguing that smaller colonies had higher mortality risk than larger colonies. In many cases smaller colonies represent younger colonies with possibly less well established breeders and a higher degree of social instability. In other cases, the breeding female may not be very successful in raising her young, and possibly is not producing “high-quality” offspring. We agree with the Reviewer, behavioral assessments are needed to evaluate if there is more fighting and competition for dominance or if other social dynamics or ‘poorer-quality’ offspring are at play, nevertheless these findings are intriguing and we have speculated as to why this is the case. Further work is needed to definitively tease out why this is indeed the case.

      References cited here

      Andziak et al., (2006) doi: 10.1111/j.1474-9726.2006.00237

      Andziak and Buffenstein (2006) doi: 10.1111/j.1474-9726.2006.00246

      Braude et al., (2021) doi: 10.1111/brv.12660

      Cain et al (2011) doi: 10.1093/aje/kwq481

      Cram et al., (2018) doi: 10.1016/j.cub.2018.07.021

      Dammann et al., (2011) doi: 10.1371/journal.pone.0018757

      Dammann et al., (2019) doi:10.7554/eLife.45415

      Delaney et al., (2021) doi: 10.1007/978-3-030-65943-1_15

      De Magalhaes et al., (2007) doi: 10.1093/gerona/62.6.583

      Edrey et al., (2011) doi: 10.1093/ilar.52.1.41

      Horvath et al., (2022) doi:10.1038/s43587-021-00152-1

      O’Riain et al., (2000) doi: 10.1073/pnas.97.24.13194 Ruby et al., (2018) doi: 10.7554/eLife.31157

      Ruby et al., (2019) doi: 10.7554/eLife.47047.

      Sharp and Clutton-Brock,(2010) doi: 10.1111/j.1365-2656.2009.01616.

      Sherman and Jarvis (2002) doi: 10.1017/S0952836902001437

    1. Author Response

      Reviewer #2 (Public Review):

      The manuscript by Ramesh et al builds upon prior studies from the Sigrist group to examine synergistic interactions between the Spinophilin (Spn) and Syd-1 synaptic proteins and their role in regulating presynaptic homeostatic plasticity at Drosophila larval NMJs and adult olfactory memory in the Mushroom Body (MB). The authors show synergistic interactions between the two proteins in these processes, where late PHP and long-term memory are abolished in Spn mutants, but restored upon reduction of Syd-1 function in the mutants. The authors go on to show that Spn appears to act in PHP by regulating a late stage in AZ remodeling and longer-term increases in the readily releasable SV pool by controlling actin polymerization/dynamics through the Mical protein. Although key aspects of the overall bigger picture have been published before (Mical’s role in PHP, antagonism between Spn and Syd-1 in AZ development, AZ remodeling in MB-dependent memory), the current paper ties together many of these observations into a bigger picture of how PHP plasticity at the NMJ is established and provides support for a role for PHP-required proteins in promoting long-term memory in the adult MB through effects on AZ structure and AZ protein content/amount. The study also provides new links to the role of Spn in regulating local synaptic actin dynamics and how this alters the readily releasable pool and SV release. Some points of note are provided below.

      1) I’m a bit confused about the time course experiments the authors describe that seem to be contradictory in Figures 1 and 2. The authors indicate control animals transiently increase BRP AZ levels during PHP at 10 mins, but by 30 minutes this increase is gone, even though PHP remains. As such, the data in these early figures suggests increases in BRP AZ levels may support an early aspect of the PHP effect (though I note this appears controversial, as other data indicate blocking the rapid AZ remodeling by several manipulations such as Arl8 transport disruption, permits early PHP, but disrupts late PHP). In contrast, the authors show that Spn mutants do not display AZ BRP increase at 10 mins, and still show early PHP, but lack late PHP. I assume the early PHP does not require AZ remodeling or an increase in the RRP at this early time point?

      We thank the reviewer for this insightful question, which to a degree is reflected also in reviewer 1´s question concerning the variability of Spn mutants when tested for PHP at 10 min PhTx treatment and thus the temporally and likely functional entanglement of induction and maintenance mechanisms.

      Let us start by once again describing our findings: BRP increase is clear at 10 min PhTx treatment but is no longer measurable at 30 min PhTx treatment. Genetic elimination of BRP does not restrict PHP at 10 min PhTx (Bohme et al. 2019). However, BRP mutants are neither able to maintain PHP when PhTx treatment is extended to 30 minutes as described in Turrel et al (Turrel et al. 2022), nor in a chronic PHP paradigm of BRP, GluRIIA double mutant (Bohme et al. 2019). We suggest that the transient increase of BRP, also previously described specifically in the MB γ-neurons (Zhang et al. 2018), triggers other, longer lasting AZ changes. Indeed, we found that the increase of the critical release factor Unc13A is still present at 30 min PhTx treatment and is dependent on the “transient” BRP increase (Fig. S3B) (Turrel et al. 2022). Turrel et al also uncovered a more transient upregulation of BRP when compared to Unc13A in the MB. Here, specifically upon paired olfactory conditioning, 1 h after training, animals displayed BRP and Unc13A level increases. At 3 h post training, however, BRP levels had already plateaued, whereas Unc13A levels had increased further (Figure 1B, (Turrel et al. 2022)).

      We have now added to the discussion section: “We suggest that the transient increase of BRP, also previously described specifically in the MB γ-neurons (Zhang et al. 2018), triggers other, longer lasting AZ changes. Indeed, we found that the increase of the critical release factor Unc13A is still present at 30 min PhTx treatment and is dependent on the “transient” BRP increase (Fig. S3B) (Turrel et al. 2022). Turrel et al also uncovered a more transient upregulation of BRP when compared to Unc13A in the MB. Here, specifically upon paired olfactory conditioning, 1 h after training, animals displayed BRP and Unc13A level increases. At 3 h post training, however, BRP levels had already plateaued, whereas Unc13A levels had increased further (Fig. 1B, Turrel et al).” (Line 363)

      RRP increase has been shown at 10 min PhTx (Weyhersmuller et al. 2011) treatment and remains high after 30 minutes of PhTx treatment (this study).

      2) In relation to point 1 above, the time course seems different in MB neurons, where the AZ remodeling (noted by increases in AZ BRP) seems to take 2-3 hours. Do the authors have any ideas into why the time course of PHP AZ remodeling at larval NMJs can occur in 10 minutes, but MB neuron remodeling seems to take hours?

      We thank the reviewer for this question. We specifically probed the time intervals of 10 and 30 min at the NMJ due to established protocols and technical reasons; and 1hr and 3hr in the brain due to our interest in MTM. Zhang et al (Zhang et al. 2018) previously showed that indeed BRP levels in the γ-lobe were significantly increased already after 20 min after conditioning. We in the moment can only suspect that the following differences might be relevant in this point: the differences in the peripheral and central nervous system in terms of glutamatergic motoneuron presynapses (NMJ) versus cholinergic (KC presynapses) might change temporal dynamics of AZ remodeling. Furthermore, the plasticity induction protocol, using PhTx, is potentially a somewhat more “heavy-handed” approach compared to the more subtle conditioning involving the activation of dopaminergic neurons. The more complex circuitry of the central brain might also be involved in maintaining this BRP levels increase over longer timescales than at the NMJ, which may serve some yet unknown physiological purpose in maintaining memories.

      We use the NMJ PhTx assay to identify proteins involved in AZ remodeling that could also be involved in memory formation in adult flies. As of now, we have no experimental evidence of whether the AZ remodeling observed in the MB actually leads to synaptic depression or instead is a reaction to the initial short-term synaptic depression occurring. This study and Turrel et al. 2022 (Turrel et al. 2022) provide evidence for an overlap of the executory machinery involved in both mechanisms, NMJ PHP plasticity and MTM formation, as BRP, Spn, Arl8, IMAC and Aplip1 are involved specifically both in mid-term NMJ PHP (at 30 min after PhTx treatment) and in MTM.

      3) Could the lack of rapid BRP accumulation during early PHP in Spn mutants be secondary to the larger # of AZs in those mutants and a known rate-limiting amount of BRP available that might not be enough to go to the extra Azs?

      This per se might be a relevant concern. Notably, however, acute application of Latrunculin-B in Spn mutants allowed for an increase in BRP (Figure 5g-h). Thus, a limitation in the total pool of available BRP should not be responsible for Spn mutants’ inability to accumulate BRP under PhTx treatment.

      4) There isn't any validation of the Spn co-IP results shown in Figure 3 through other assays, and a lot of proteins are being pulled down. I can't see some of these being real (mitochondrial translation proteins? - how could Spn gain access to the inside of the mitochondria since it's a cytosolic protein?). As such, I don't know how to value that huge group of pull-down interactions without further validation, making it difficult to sort out how relevant these really are. The genetic validation of similar phenotypes in the Mical mutant, together with rescues, supports that interaction. Not sure about the rest of that list.

      We appreciate the opportunity to discuss our primary data and how we used them to generate testable hypotheses for our study. Firstly, the mitochondrial translation proteins which were identified in our Spn IPs are all nuclear encoded, means they are transcribed in the nucleus and translated in the cytoplasm. Interestingly, recent work indeed suggests that mitochondrial biogenesis in the synapse is supported by local translation (e.g. see (Kuzniewska et al. 2020)). As Spn IPs are also highly significantly enriched for cytosolic translation machinery, it is an appealing idea that Spn might be involved in coupling local translation, mitochondria and memory stabilization. As this clearly goes beyond the scope of this paper, we did not further discuss this point, and are prepared to remove these data if considered misleading.

      Concerning unspecific proteins being pulled down in our IPs, we would like to emphasize that these IPs are the result of an established out protocol, which entails laborious synaptosome preparations which our lab worked out previously (Depner et al. 2014). For each condition, 4 biological replicates were performed, and mitochondrial ribosomal proteins were enriched with p<10-30 significance, and never observed in our extensive systematic work on active zone biochemistry for any other active zone protein.

      In this study, we used the Spn IPs to identify putative interaction partners, with the intention to validate the physiological relevance of any positive hits through experiments, like we did in the case of Mical. We were also able to identify previously known interaction partners like Syd-1 and Nrx-1 (Muhammad et al. 2015). Obviously, we did not independently validate these findings for the large number of identified proteins, e.g. by using in vitro purified proteins (we do not consider Western probing of IPs to be independent proof of any complementary value to mass-spectrometry based quantification).

      We have now added this sentence to our manuscript:

      “As a validation of the list of proteins that were returned as interaction partners of Spn in this work, we were able to reconfirm previously known interactions (Muhammad et al. 2015), e.g., Syd-1 (Figure 3b) and Nrx-1 (not shown).” (Line 148)

      5) Are the authors worried about the fact that the Actin-GFP line they use to look at synaptic actin dynamics is driven by a GAL4, and the 2nd top hit of their Spn IP pull downs are translation regulators? Could the changes in actin-GFP they see between control and Spn mutants have anything to do with a different translation of the exogenous UAS-actin-GFP? Would have been helpful to do an endogenous stain for actin levels with an anti-actin antibody so no transcription/translation issues of a transgene would be at play. This would be easy to do for the quantification of total actin levels at the synapse.

      This is per se a fully justified concern, which is hard to be fully excluded. Indeed, when preparing this manuscript, we attempted to visualize and quantify the endogenous presynaptic actin through immunostaining. However, these attempts were unsuccessful, as the very bright muscle actin staining obscures the relatively low levels of actin present close to the presynaptic AZs, even when using super-resolution light microscopy. Still, we would like to emphasize that Spn and Syd-1 antagonized each others’ function concerning apparent F-actin level (using Gal4 expression of actin-GFP). Given the known connection of Spn operating as a compartment specific F-actin breaker (Chia, Patel, and Shen 2012; Ryan et al. 2005; Nakanishi et al. 1997), we are still rather confident about our finding and its interpretation.

      Concerning the FRAP analyses, we are fully confident of our findings, as the intensity of actin-GFP is internally normalized within each NMJ. Therefore, the differences in FRAP experiments should be independent of the starting amounts of actin in control and mutant animals. As we can show that the Spn/Syd-1 antagonism functions on actin dynamics as well (Figure 4j), we are sure concerning the physiological relevance of our observations.

      6) Are Mical levels normalized in the Spn, Syd1 double mutants, given PHP is recovered?

      We thank the reviewer for the comment and agree that Mical levels should be expected to normalize upon Syd-1 heterozygosity in Spn mutants. We have now immunostained for Mical in wildtype, Spn mutants and Spn mutants with Syd-1 heterozygosity to address this question. We found that Mical levels in Spn mutants were indeed normalized upon Syd-1 heterozygosity (Figure 5 - Figure supplement 1 c-d).

    1. Author response

      Reviewer #1 (Public Review):

      The usual strategy to combat antimicrobial drug resistance is to administer a combination of two drugs with distinct mechanisms. An alternative, however, would be to use two drugs that attack the same target, if resistance to one is incompatible with resistance to the other. The authors previously studied parasites resistant to the dihydroorotate dehydrogenase (DHODH) inhibitor DSM265 through an E182D mutation and found that resistance to another inhibitor, IDI-6273, resulted in a reversion to wild-type. Here, they screened various other inhibitors and found that TCMDC-125334 is more active on DSM265-resistant parasites than the wild-type. In this case, however, it was possible for the parasites to become resistant to both inhibitors, either by increasing the copy number of DSM-265-resistant DHODH genes (with a C276Y mutation) or by the emergence of a different mutation. The selection of wild-type parasites with both compounds resulted in resistance but this took considerably longer than for either compound alone. (The actual frequency of double resistance emergence was not measured.)

      Overall the results suggest that for DHODH, when pre-existing resistant parasites are selected with another inhibitor, the results will depend on both the initial mutation and the new inhibitor. The data are solid and convincing and suggest that DHODH has considerable scope for resistance development. The observations do have relevance for other inhibitors and/or enzyme drug targets. However from the data so far, the sweeping statements that the authors make concerning double resistance, in general, are not supported.

      The formatting of the Figures requires some improvement and in some cases, more details of the statistical analyses are needed.

      We thank Reviewer 1 for their kind and helpful comments. We have answered their specific concerns below. In particular, we have improved the formatting of the figures based on their recommendations. We have also edited the discussion based on reviewer 1’s comments.

      Reviewer #2 (Public Review):

      This article focuses on drug resistance acquired by Plasmodium falciparum malaria parasites that have been pressured with different inhibitors of the essential enzyme DHODH (dihydroorotate dehydrogenase). The study focuses on collateral sensitivity between DSM265, which has been evaluated in a human clinical trial and found to select for resistance via the point mutation C276Y (C276F and G181S were also implicated; PMID 29909069), and the GSK compound TMCDC-125334, against which a panel of DHODH mutant parasites (including C276Y) were found to have increased sensitivity. The authors herein explore this case of "collateral sensitivity" by examining whether these two inhibitors, when used simultaneously, might preclude the selection of resistant parasites. The answer, in this case, is no; collateral sensitivity did not prevent parasites from acquiring a novel mutation (V532A) that mediated resistance to both. Culture competition assays provide evidence that this mutant retains normal fitness. The authors conclude that for this target the idea of combining these inhibitors is not a viable therapeutic strategy. The authors also illustrate how TMCDC-125334 can select for resistance via a separate mutation (I263S) or amplification of a chromosomal segment containing dhodh. They also present modeling data to examine binding poses and how mutations could impact drug binding, which is allosteric to the enzyme's substrates (orotate and FMN). The data are thorough and provide convincing evidence that in this case collateral sensitization by distinct chemotypes does not translate into a viable strategy to inhibit DHODH in a way that can preclude mutations that confer cross-resistance.

      We thank the reviewer for their kind comments and helpful recommendations.

      Reviewer #3 (Public Review):

      'Collateral sensitivity' occurs when drug-resistance mutations render a drug target more sensitive to inhibition by another drug, which has been previously described in some detail for malaria parasite dihydroorotate dehydrogenase (DHODH - see refs 36, 46, and 47, for example). Although it has been suggested that combinations of such drugs could potentially suppress the emergence of resistance, cross-resistance-associated mutation (or copy-number variation, CNV) could render such combination strategies ineffective. In the current study, the authors assess a new pairing of DHODH-targeting drugs. Cross-resistant parasites with DHODH mutation or CNV arise following either sequential or combined drug selection, suggesting that the drug combination described would likely fail to effectively suppress the emergence of resistance.

      The strength of the study is that it describes, for a particular drug combination, different mutations associated either with collateral sensitivity or with cross-resistance, and the authors conclude that "combination treatment with DSM265 and TCMDC-125334 failed to suppress resistance". They go on to say that this "brings into question the usefulness of pursuing further DHODH inhibitors." More specific interpretations and implications of the study are as follows:

      a. Other combinations may also fail but there may be combinations that can effectively suppress resistance. A more exhaustive analysis of mutational space will likely be required to determine which combinations if any, would be predicted to succeed in a clinical setting.

      b. It was previously reported that "a combination of [DHODH] wild-type and mutant-type selective inhibitors led to resistance far less often than either drug alone. ... Comparative growth assays demonstrated that two mutant parasites grew less robustly than their wild-type parent, and the purified protein of those mutants showed a decrease in catalytic efficiency, thereby suggesting a reason for the diminished growth rate" (Ref 46). Also, "selection with a combination of Genz-669178, a wild-type PfDHODH inhibitor, and IDI-6273, a mutant-selective PfDHODH inhibitor, did not yield resistant parasites" (Ref 36). It is possible that these previously tested combinations would also yield cross-resistant mutants if selected further.

      c. Although increased DHODH copy number "confers only moderately reduced susceptibility" to the drug used for selection and although these clones were not assessed here for cross-resistance, it seems likely that CNV may represent a general mechanism that could undermine other collateral resistance strategies.

      We thank the reviewer for their kind and helpful comments.

    1. Author Response

      Reviewer #1 (Public Review):

      This study applies state-of-the-art single-cell transcriptome analysis to investigate the nature of drug tolerance, a phenomenon distinct from drug resistance, and a problem of considerable importance in the treatment of C. albicans infections. The authors first show that their transcriptomics platform can reveal sub-populations of untreated cells that display distinct transcription profiles related to metabolic and stress responses that are coupled with cell cycle regulation. They note the consistency of these findings with previous work indicating connections between cell cycle phase and expression of genes related to stress responses and metabolism and argue that this validates their experimental approach, which relies on a complex statistical analysis of sparse data from a relatively small number of single cells. They then proceed to analyze drug-treated cells, mostly focusing on fluconazole (FCZ; which targets ERG11, thus disrupting ergosterol biosynthesis and membrane integrity) and examining individual cells at 2-, 3-, and 6-days following treatment. Their primary finding is the identification of two major classes of cells, one of which they call the α response, characterized by high ribosomal protein (RP) gene expression and the absence of either heat shock or hyperosmotic stress gene expression as well as low expression of glycolytic, carbohydrate reserve pathway, and histone genes. The second survival state on day 2 (called the β response) instead displays low RP gene expression and high heat-shock stress response. Interestingly, the proportion of β cells clearly increases on day 3. In addition, responses to caspofungin (CSP) and rapamycin (RAPA) are examined and compared to FCZ or untreated cells. The main conclusion that the authors draw from their data is that the initial α response transitions to the β response, which is similar to a recently characterized ribosome assembly stress response (RASTR) in the budding yeast S. cerevisiae. They argue that the transcriptional state in α cells provokes the transition to the β state.

      This manuscript presents an enormous amount of complex data whose significance will be difficult to evaluate for those (e.g., this reviewer) not immersed in the specialized analytical techniques used here. Taken at face value, however, the experimental findings are consistent with the authors' main conclusions. Nevertheless, and consistent with the complexity of the responses observed, there are many findings that remain to be explored in mechanistic detail and for which conclusions are less precise.

      We thank Reviewer #1 for their excellent questions. The manuscript does have a large amount of complex data so this version of the manuscript has a tighter focus on the major findings (i.e. 𝛼/Rd versus β/Sd subpopulations in response to FCZ). We have tried to explore these subpopulations in greater depth with supporting data from complementary technologies and additional bioinformatic analyses. We agree that there still remains several observations in the manuscript that are not explored in mechanistic detail. We have tried our best to clearly delineate the evidence that we have for these findings in addition to their potential significance.

      Towards the simplification of the manuscript, we have moved the discussion regarding “comets” to Appendix 2 [Changes L837-897] along with the detailed analysis of the response of cells to rapamyacin and caspofungin [Changes L899-963]. We have also removed from the manuscript a paragraph (and associated Figure 2 - figure supplement 5 in the original manuscript) from the Discussion that described our inability to assign DNA level chromosomal aberrations to either the Rd or Sd subpopulations using whole genome sequencing. Figures 5 and 6 of the original manuscript depicted GO analysis that compared changes in the molecular processes between 𝛼/Rd and β/Sd subpopulations at day 3 and 6 respectively. Although interesting, the figures do not advance the main findings of the manuscript and have been removed from this version.

      Reviewer #2 (Public Review):

      In this manuscript, Dumeaux et al. assess the heterogeneous cellular response of the fungal pathogen Candida albicans to antifungal agents, using single-cell RNA sequencing. The researchers develop and optimized single-cell transcriptomics platform for C. albicans, and exploit this technique to monitor the cellular response to treatment with three distinct antifungal agents. Through this analysis, they identify two distinct subpopulations of cells that undergo differential transcriptomic responses to antifungal treatment: one involving upregulation of translation and respiration, and the other involving stress responses. This work monitors how different and prolonged antifungal exposure alters and shifts fungal cell populations between these responses. This is an innovative study that exploits novel single-cell transcriptomic techniques to address a very interesting question regarding the heterogeneous nature of the fungal response to antifungal drug treatment. This work optimizes a protocol for single-cell RNA sequencing, which is a significant contribution to the fungal research community and will bolster future research efforts in this area. The identification of two distinct subpopulations of fungal cells with differential responses to antifungal treatment is an exciting and novel finding. While there are aspects of this manuscript that are of significant interest, there are also limitations to this work.

      The research is framed as a method to study antifungal drug tolerance, but it is not clear how it does so, based on the methods. This work also compares very different populations of cells (rapidly growing untreated cells compared with cells grown in antifungal for several days), making it difficult to assess the role of antifungal treatment specifically in this analysis. This manuscript is also written with a great deal of highly technical language that makes it difficult to dissect the major findings and outcomes from the study.

      We sincerely thank the reviewer for these comments and for making the effort to evaluate the manuscript. We have tried to address these criticisms by improving the introduction to better explain fungal drug tolerance [Changes L53-61] and to explain how our experimental design allows us to investigate this phenomenon (for example for UT cells L184-187, L142-149). We have also re-written subsections of the results to more intuitively explain technical concepts (especially surrounding single cell technologies and analyses) [L250-257, L368-373, L699-707]. Some subsections of the results have been moved to the appendices in order to better emphasize the major findings and outcomes (e.g. comets L837-897 and in depth analysis of RAPA and CSP treatment L899-963). We address each of the specific concerns below. We have also removed some complicated analyses that did not directly advance the major findings of the manuscript including the GO analysis in Figures 5 and 6 of the original manuscript.

      Before proceeding, we would like to take this opportunity to underscore that these experiments were not primarily designed to investigate the differences between untreated (UT) and treated cells. The major findings (of the 𝛼/Rd and β/Sd subpopulations) are not dependent on the UT profiles. That is, the 𝛼/Rd and β/Sd subpopulations would be evident even if the UT profiles were removed from the manuscript entirely. Rather, the UT profiles/analyses are intended to contribute to the manuscript by helping establish the technical efficacy of the sc-profiling technique. For example, we might expect - a priori - that a large component of cell to cell heterogeneity in isogenic UT cells should correspond to differences in cell cycle, and, indeed, this is what we found.

      Indeed, we did embed (via UMAP) and cluster (via Leiden clustering) the UT data alongside data for the drug-treated cells (Figure 3), which reveals that UT cells largely cluster separately from drug-treated cells. The reviewer is absolutely correct to question the sources underlying this separation; in addition to differential cellular responses to the drug itself, some of the separation may be due to differences in the amount of growth media, for example. (The fact that different drugs (FCZ, RAPA and CSP) largely separate from UT cells and from each other may suggest that at least some of this separation could be due to differences in the mode of action of each drug rather than to issues related to, for example, media depletion. However, this difference is not a major finding of the manuscript. Rather, we agree with the reviewer that “The identification of two distinct subpopulations of fungal cells with differential responses to antifungal treatment is an exciting and novel finding”. As such, the major results begin with data in panels 3D and E that reveal the two distinct cell types within the FCZ-treated sample (a distinction that is not dependent on the status of the UT cells).

      Reviewer #3 (Public Review):

      The authors described their extensive single-cell analysis of Candida undergoing (sub-inhibitory) antibiotic treatment versus no treatment. To do so, the authors used a microfluidics platform they had previously developed, and they optimized, characterized, and validated it for this particular application. Their findings included: (a) the transcription of untreated cells is driven mostly by cell cycle phase, (b) treated cells can be clustered into several major groups and a few outlier groups that the authors termed comets, (c) cells undergoing FCZ treatment can adopt one of two different states (possibly bistability). I found the results interesting and the approach to be sound, and much of the results confirmed my prior expectations. The authors provide a detailed depiction of what is going on in the transcriptome during sub-inhibitory treatment, although this did not always lead to a mechanistic explanation. The clinical relevance was unclear to me beyond a proof of concept application for single-cell transcriptomics. In my opinion, an interesting follow-up would be to follow the transcriptional trajectory of lineages undergoing antimicrobial switching (on and off). The main issues I identified were the author's use of the term tolerance versus resistance, interpretation of "comets", clustering approach, description of fitness, and comparison between time points.

      We thank the reviewer for their time and effort with this manuscript. In the revised manuscript, we expanded the introduction to better delineate between resistance and tolerance, moved the “comets” section to the appendices, as it distracted from the major results and we provided more interpretive analysis of the findings. We also better defined the bioinformatic approaches. (Changes e.g. comets L837-897 and in depth analysis of RAPA and CSP treatment L899-963). With respect to comparisons between time points, we now address these concerns throughout the Response to Reviewer document. We have also moved a comparison of UT versus FCZ cells to Appendix 2 L828-836 as it was perhaps misleading readers of our intention. We only performed this comparison as a sort of “sanity” check to see if the single cell (sc)-profiling would detect differences between UT and drug treated cells.

    1. Author response

      Reviewer #1 (Public Review):

      This careful study reports the importance of Rab12 for Parkinson's disease associated LRRK2 kinase activity in cells. The authors carried out a targeted siRNA screen of Rab substrates and found lower pRab10 levels in cells depleted of Rab12. It has previously been reported that LLOMe treatment of cells breaks lysosomes and with time, leads to major activation of LRRK2 kinase. Here they show that LLOMe-induced kinase activation requires Rab12 and does not require Rab12 phosphorylation to show the effect.

      We thank the reviewer for their comments regarding the carefulness and importance of our work and for their specific feedback which has substantially improved our revised manuscript.

      1) Throughout the text, the authors claim that "Rab12 is required for LRRK2 dependent phosphorylation" (Page 4 line 78; Page 9 line 153; Page 22 line 421). This is not correct according to Figure 1 Figure Supp 1B - there is still pRab10. It is correct only in relation to the LLOMe activation. Please correct this error.

      We appreciate the reviewer’s comment around the requirement of Rab12 for LRRK2-dependent phosphorylation of Rab10 and question regarding whether this is relevant under baseline conditions or only in relation to LLOMe activation. Using our MSD-based assay to quantify pT73 Rab10 levels under basal conditions, we observed a similar reduction in Rab10 phosphorylation when we knockdown Rab12 as we also observed with LRRK2 knockdown (Figure 1A). Further, we see comparable reduction in Rab10 phosphorylation in RAB12 KO cells as that observed in LRRK2 KO cells using our MSD-based assay (Figure 2A and B). Based on this data, we believe Rab12 is a key regulator of LRRK2 activation under basal conditions without additional lysosomal damage. However, as the reviewer noted, we do observe some residual Rab10 phosphorylation upon Rab12 knockdown when assessed by western blot analysis (Figure 1D and Figure 1- figure supplement 1). A similar signal is observed upon LRRK2 knockdown, which may suggest that some small amount of Rab10 phosphorylation may be mediated by another kinase in this cell model. Nevertheless, we appreciate this reviewer’s point and have therefore modified the text to remove any reference to Rab12 being required for LRRK2-dependent Rab phosphorylation and now instead refer to Rab12 as a regulator of LRRK2 activity.

      As noted by the reviewer, our data does suggest that Rab12 is required for the increase in Rab10 phosphorylation observed following LLOMe treatment to elicit lysosomal damage, and we now refer to this appropriately throughout the text.

      2) The authors conclude that Rab12 recruitment precedes that of LRRK2 but the rate of recruitment (slopes of curves in 3F and G) is actually faster for LRRK2 than for Rab12 with no proof that Rab12 is faster-please modify the text-it looks more like coordinated recruitment.

      The reviewer raises an excellent point regarding our ability to delineate whether Rab12 recruitment precedes that of LRRK2 on lysosomes following LLOMe treatment. As noted by the reviewer, we do see both the recruitment of Rab12 and LRRK2 to lysosomes increase on a similar timescale, so we cannot truly resolve whether Rab12 recruitment precedes LRRK2 recruitment in our studies. Based on this, we have modified the text to emphasize that this data supports coordinated recruitment, as suggested, and we have further removed any mention of Rab12 preceding LRRK2. The specific change is as follows “Rab12 colocalization with LRRK2 increased over time following LLOMe treatment, supporting potential coordinated recruitment of these proteins to lysosomes upon damage (Figure 3I). Together, these data demonstrate that Rab12 and LRRK2 both associate with lysosomes following membrane rupture.” and can be found on lines 460-463 of the updated manuscript.

      3) The title is misleading because the authors do not show that Rab12 promotes LRRK2 membrane association. This would require Rab12 to be sufficient to localize LRRK2 to a mislocalized Rab12. The authors DO show that Rab12 is needed for the massive LLOME activation at lysosomes. Please re-word the title.

      To address the reviewer’s concern regarding the title of our manuscript, we have modified the title from “Rab12 regulates LRRK2 activity by promoting its localization to lysosomes” to “Rab12 regulates LRRK2 activity by facilitating its localization to lysosomes” to soften the language around the sufficiency of Rab12 in regulating the localization of LRRK2 to lysosomes. We show that Rab12 deletion significantly reduces LRRK2 activity (as assessed by Rab10 phosphorylation on lysosomes) and significantly increases the localization of LRRK2 to lysosomes upon lysosomal damage. The updated title better reflects the regulatory role of Rab12 in modulating LRRK2 activity, and we thank the reviewer for their suggestion to modify this accordingly.

      Reviewer #2 (Public Review):

      This study shows that rab12 has a role in the phosphorylation of rab10 by LRRK2. Many publications have previously focused on the phosphorylation targets of LRRK2 and the significance of many remains unclear, but the study of LRRK2 activation has mostly focused on the role of disease-associated mutations (in LRRK2 and VPS35) and rab29. The work is performed entirely in an alveolar lung cell line, limiting relevance for the nervous system. Nonetheless, the authors take advantage of this simplified system to explore the mechanism by which rab12 activates LRRK2. In general, the work is performed very carefully with appropriate controls, excluding trivial explanations for the results, but there are several serious problems with the experiments and in particular the interpretation.

      We appreciate the reviewer’s comments regarding the rigor of our work and the potential impact of our studies to address a key unanswered question in the field regarding the mechanisms by which LRRK2 activation is mediated. Our studies focused on the A549 cell model given its high endogenous expression of LRRK2 and Rab10, and this cell line provided a simple system to investigate the mechanism and impact of Rab12-dependent regulation of LRRK2 activity. We agree with the reviewer that future studies are warranted to understand whether similar Rab12-dependent regulation of LRRK2 occurs in relevant CNS cell types.

      First, the authors note that rab29 appears to have a smaller or no effect when knocked down in these cells. However, the quantitation (Fig1-S1A) shows a much less significant knockdown of rab29 than rab12, so it would be important to repeat this with better knockdown or preferably a KO (by CRISPR) before making this conclusion. And the relationship to rab29 is important, so if a better KD or KO shows an effect, it would be important to assess by knocking down rab12 in the rab29 KO background.

      The reviewer raises a good point regarding the importance of confirming that loss of Rab29 has no effect on Rab10 phosphorylation. To address potential concerns about insufficient Rab29 knockdown, we measured the levels of pT73 Rab10 in RAB29 KO A549 cells by MSD-based analysis. RAB29 deletion had no effect on Rab10 phosphorylation, confirming findings from our RAB siRNA screen and the observations of Dario Alessi’s group reported previously (Kalogeropulou et al Biochem J 2020; PMID: 33135724). We have included this new data into our updated manuscript in Figure 1- figure supplement 1 and comment on it on page 6 in the updated Results section.

      Secondly, the knockdown of rab12 generally has a strong effect on the phosphorylation of the LRRK2 substrate rab10 but I could not find an experiment that shows whether rab12 has any effect on the residual phosphorylation of rab10 in the LRRK2 KO. There is not much phosphorylation left in the absence of LRRK2 but maybe this depends on rab12 just as much as in cells with LRRK2 and rab12 is operating independently of LRRK2, either through a different kinase or simply by making rab10 more available for phosphorylation. The epistasis experiment is crucial to address this possibility. To establish the connection to LRRK2, it would also help to compare the effect of rab12 KD on the phosphorylation of selected rabs that do or do not depend on LRRK2.

      The reviewer raises an interesting question regarding whether Rab12 can further reduce Rab10 phosphorylation independently of LRRK2. Using our quantitative MSD-based assay, we observe that pRab10 levels are at the lower limits of detection of the assay in LRRK2 KO A549 cells. Unfortunately, this means that we are unable to detect whether there might be any additional minor reduction in Rab10 phosphorylation with Rab12 knockdown in LRRK2 KO cells. We cannot rule out that Rab12 may play a LRRK2-independent role in regulating Rab10 phosphorylation in other cell lines, and future studies are warranted to explore whether Rab12 knockdown can further reduce Rab10 phosphorylation in other systems, including in CNS cells.

      Regarding exploring the effects of RAB12 knockdown on the phosphorylation of other Rabs, we also assessed the impact of RAB12 KO on phosphorylation of another LRRK2-Rab substrate, Rab8a. We observed a strong reduction in pT72 Rab8a levels in RAB12 KO cells compared to wildtype cells, suggesting the impact of RAB12 deletion extends beyond Rab10 (see representative western blot in Author response image 1). Due to potential concerns with the selectivity of the pT72 Rab8a antibody (potentially detecting the phosphorylation of other LRRK2-Rabs), we cannot definitively demonstrate that Rab12 mediates the phosphorylation of other Rabs. This question should be revisited when additional phospho-Rab antibodies become available that enable us to selectively detect LRRK2-dependent phosphorylation of additional Rab substrates under endogenous expression conditions.

      Author response image 1.

      A strength of the work is the demonstration of p-rab10 recruitment to lysosomes by biochemistry and imaging. The demonstration that LRRK2 is required for this by biochemistry (Fig 4A) is very important but it would also be good to determine whether the requirement for LRRK2 extends to imaging. In support of a causal relationship, the authors also state that lysosomal accumulation of rab12 precedes LRRK2 but the data do not show this. Imaging with and without LRRK2 would provide more compelling evidence for a causative role.

      We thank the reviewer for their suggestion to assess Rab12 recruitment to damaged lysosomes with and without LRRK2 using imaging-based analyses to add confidence to our findings from biochemical approaches. To address this comment, we have imaged the recruitment of mCherry-tagged Rab12 to lysosomes (as assessed using an antibody against endogenous LAMP1) and observed a significant increase in Rab12 levels on lysosomes following LLOMe treatment. This occurs to a similar extent in LRRK2 KO A549 cells, suggesting that Rab12 is an upstream regulator of LRRK2 activity. This new data has been incorporated into the revised manuscript (Figure 3E) and is presented on page 20 of the updated manuscript.

      Our conclusions on this are further strengthened by new data assessing Rab12 recruitment to lysosomes using orthogonal analysis of isolated lysosomes biochemically. Using the Lyso-IP method, we observed a strong increase in the levels of Rab12 on lysosomes following LLOMe treatment that was maintained in LRRK2 KO cells. These data have been added to the updated manuscript (new data added to Figure 3- figure supplement 1).

      Together, these data support our hypothesis that Rab12 recruitment to damaged lysosomes is upstream, and independent, of LRRK2.

      The authors also touch base with PD mutations, showing that loss of rab12 reduces the phosphorylation of rab10. However, it is interesting that loss of rab12 has the same effect with R1441G LRRK2 and D620N VPS35 as it does in controls. This suggests that the effect of rab12 does not depend on the extent of LRRK2 activation. It is also surprising that R1441G LRRK2 does not increase p-rab10 phosphorylation (Fig 2G) as suggested in the literature and stated in the text.

      We agree with the reviewer that it is quite interesting that RAB12 knockdown significantly attenuates Rab10 phosphorylation in the context of PD-linked variants in addition to that observed in wildtype cells basally and after LLOMe treatment. As noted by the reviewer, we did not observe increased levels of phospho-Rab10 in LRRK2 R1441G KI A549 cells at the whole cell level (Figure 2G). However, we observed a significant increase in Rab10 phosphorylation on isolated lysosomes from LRRK2 R1441G KI cells compared to WT cells (Figure 4B). This may suggest that the LRRK2 R1441G variant leads to a more modest increase in LRRK2 activity in this cell model. Previous studies in MEFs from LRRK2 R1441G KI mice or neutrophils from human subjects that carry the LRRK2 R1441G variant showed a 3-4 fold increase in Rab10 phosphorylation (Fan et al Acta Neuropathol 2021 PMID: 34125248 and Karaye et al Mol Cell Proteomics 2020 PMID: 32601174), supporting that this variant does lead to increased Rab10 phosphorylation and that the extent of LRRK2 activation may vary across different cell types.

      Most important, the final figure suggests that PD-associated mutations in LRRK2 and VPS35 occlude the effect of lysosomal disruption on lysosomal recruitment of LRRK2 (Fig 4D) but do not impair the phosphorylation of rab10 also triggered by lysosomal disruption (4A-C). Phosphorylation of this target thus appears to be regulated independently of LRRK2 recruitment to the lysosome, suggesting another level of control (perhaps of kinase activity rather than localization) that has not been considered.

      The reviewer suggests an interesting hypothesis around the existence of additional levels of control beyond the lysosomal levels of LRRK2 to lead to increased Rab10 phosphorylation of lysosomes. Given the variability we have observed in measuring endogenous LRRK2 levels on lysosomes, we performed two additional replicates to assess lysosomal LRRK2 levels in LRRK2 R1441G KI and VPS35 D620N KI cells at baseline and after treatment with LLOMe. We observed a significant increase in LRRK2 levels on lysosomes in cells expressing either PD-linked variant and a trend toward a further increase in the levels of LRRK2 on lysosomes after LLOMe treatment in these cells (Figure 4D in the updated manuscript). We have updated the text on page 24 to reflect this change, suggesting that the PD-linked variants do not fully occlude the effect of lysosomal disruption on the lysosomal recruitment of LRRK2.

      LLOMe treatment leads to a stronger increase in Rab10 phosphorylation on lysosomes from LRRK2 R1441G and VPS35 D620N cells compared to the modest increase in LRRK2 levels observed. This could suggest that, as the reviewer noted, additional mechanisms beyond increased lysosomal localization of LRRK2 may be driving the robust increase in Rab10 phosphorylation observed. We have modified the results section on lines 548-551 to highlight this possibility: “Rab10 phosphorylation showed a more significant increase in response to LLOMe treatment than LRRK2 on lysosomes from LRRK2 R1441G and VPS35 D620N KI cells, suggesting that there may be more regulation beyond the enhanced proximity between LRRK2 and Rab that contribute to LRRK2 activation in response to lysosomal damage.”

      Reviewer #3 (Public Review):

      Increased LRRK2 kinase activity is known to confer Parkinson's disease risk. While much is known about disease-causing LRRK2 mutations that increase LRRK2 kinase activity, the normal cellular mechanisms of LRRK2 activation are less well understood. Rab GTPases are known to play a role in LRRK2 activation and to be substrates for the kinase activity of LRRK2. However, much of the data on Rabs in LRRK2 activation comes from over-expression studies and the contributions of endogenously expressed Rabs to LRRK2 activation are less clear. To address this problem, Bondar and colleagues tested the impact of systematically depleting candidate Rab GTPases on LRRK2 activity as measured by its ability to phosphorylate Rab10 in the human A549 type 2 pneumocyte cell line. This resulted in the identification of a major role for Rab12 in controlling LRRK2 activity towards Rab10 in this model system. Follow-up studies show that this role for Rab12 is of particular importance for the phosphorylation of Rab10 by LRRK2 at damaged lysosomes. Increases in LRRK2 activity in cells harboring disease-causing mutants of LRRK2 and VPS35 also depend (at least partially) on Rab12. Confidence in the role of Rab12 in supporting LRRK2 activity is strengthened by parallel experiments showing that either siRNA-mediated depletion of Rab12 or CRISPR-mediated Rab12 KO both have similar effects on LRRK2 activity. Collectively, these results demonstrate a novel role for Rab12 in supporting LRRK2 activation in A549 cells. It is likely that this effect is generalizable to other cell types. However, this remains to be established. It is also likely that lysosomes are the subcellular site where Rab12-dependent activation of LRRK2 occurs. Independent validation of these conclusions with additional experiments would strengthen this conclusion and help to address some concerns that much of the data supporting a lysosome localization for Rab12-dependent activation of LRRK2 comes from a single method (LysoIP). Furthermore, there is a discrepancy between panel 4A versus 4D in the effect of LLoMe-induced lysosome damage on LRRK2 recruitment to lysosomes that will need to be addressed to strengthen confidence in conclusions about lysosomes as sites of LRRK2 activation by Rab12.

      We thank the reviewer for their comments regarding our work that identifies Rab12 as a novel regulator of LRRK2 activation and the appreciation of the parallel approaches we employed to add confidence in this effect.

      As suggested by the reviewer, we have updated our manuscript to now include independent validation of our conclusions using imaging-based analyses to complement our data from biochemical analyses using the Lyso-IP method. Specifically, we have included new imaging data that confirms that Rab12 levels are increased on lysosomes following membrane permeabilization with LLOMe treatment and demonstrates that this occurs independent of LRRK2, providing additional support that Rab12 is an upstream regulator of LRRK2 activity (Figure 3E in the updated manuscript).

      Regarding the reviewer’s comment on a discrepancy between our findings in Figure 4A and Figure 4D, we have performed additional independent replicates in Figure 4D to assess the impact of lysosomal damage on the lysosomal levels of LRRK2 at baseline or upon the expression of genetic variants. We observed a significant increase in LRRK2 levels on lysosomes following LLOMe treatment in our set of experiments included in Figure 4A and a non-significant trend toward an increase in LRRK2 levels on isolates lysosomes in Figure 4D. As described in more detail below (in response to the second point raised by this reviewer), we think this variability arises because of a combination of low levels of LRRK2 on lysosomes with endogenous expression and variability across experiments in the efficiency of lysosomal isolation. Our observations of increased recruitment of LRRK2 to lysosomes upon damage are further supported by parallel imaging-based studies (Figure 3F-I) and are consistent with previous studies using overexpression systems.

      We thank the reviewer for all of the suggestions which have added further confidence to our conclusions and substantially improved the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript provides novel and intriguing experiments that aim to elucidate the mechanical properties of the Reissner fiber (RF) and to probe its interactions with the motile cilia in the central canal of the spinal cord. Using in vivo imaging in larval zebrafish, the authors show that the RF is under tension and oscillates dorsoventrally. Importantly, ablation of the RF triggered retraction and relaxation of the fiber cut ends. The retraction speed depends on where the fiber was ablated, with fastest retraction in the rostral side, indicating that tension in the RF builds up rostrally. The authors, based on observations from live imaging of intact and ablated RF and central canal, conjecture that numerous ependymal motile monocilia, that are tilted caudally and interact frequently with the RF, contribute to RF heterogenous tension via weak interactions.

      The work is important. The experiments are thorough and intricate. The findings are fascinating and open up the prospect for future investigations and models. I'm particularly curious as to what future experiments can be used to test the hypothesis put forward by the authors about the role of ciliafiber interactions in the RF mechanical properties and function.

      We thank Reviewer#1 for showing enthusiasm and support.

      Reviewer #2 (Public Review):

      The present manuscript by the Claire Wyart group analyses the behaviour of Reissner's fibre (RF) when it is cut, as well as the behaviour of cells touching RF when contact is interrupted. They show that RF is under tension that is higher in the rostral than in the caudal spinal cord. One of the proposed mechanisms is a caudally oriented movement of the cilia of ependymal radial glials cells (ERG) that is inherent rather than caused by the contact with RF. Kolmer Agduhr neurons that are also CSF contacting (CSF-cN), alter their activity when contact is lost through laser ablation of RF.

      This is an interesting paper - RF has long been proposed to be a source of signalling molecules in the development and physiological function of neural cells in the spinal cord. Cilia are the main centre of signalling activity in ciliated cells (e.g. for sonic hedgehog signalling) and the fact that ERG cilia are in direct contact with RF is intriguing. Presumably, signalling molecules could be directly transferred from RF to ERG at the contact points.

      Functionally, CSF-cN are augmenting spinal cord intrinsic sensory feedback on body curvature. This had been shown in vitro/ex vivo, but not clearly evaluated in the living animal. The data shown here demonstrate a possible mechanism for how the feedback can be mediated through contact with RF. This is of fundamental interest to understand the functioning of a locomotor network that is under evolutionary pressure to function early, since fish hatch at 3 days post fertilisation.

      We thank Reviewer#2 for the interest in our work.

      Interestingly, the authors propose (and discuss against the relevant literature) that the presence of RF in the central canal can influence the flow of the CSF, which should be investigated in further work.

      To bring readers back in the context of the existing literature:

      When using beads to track particles in the flow in the presence or in the absence of RF, we have not seen major difference in the bidirectional dorsoventral profile of the embryonic CSF flow (Cantaut-Belarif et al CB 2018 ; Sternberg et al., Nature Comm 2019 ; Thouvenin et al., eLife 2020).

      However, we cannot exclude that there could be a very local impact of the RF on CSF flow, due to the fact that the flow has to be null on the surface of the fiber (of 200 nm diameter). With our methods for tracking fluorescent particles in single planes at a time (Cantaut-Belarif et al CB 2018 ; Sternberg et al., Nature Comm 2019 ; Thouvenin et al., eLife 2020), we are likely missing the fiber in the plane and the fine analysis of the domain surrounding the fiber is not resolved. However, a null flow at the surface of the RF would impose a sharp gradient around the fiber.

      Note that our results estimating the effect of cutting the fiber on the beating frequency of motile cilia were not consistent across fish – half the cilia showing an increase while the rest show a decrease, making it hard to conclude. A finer analysis with higher temporal and spatial resolution in 3D will be necessary to decipher the role of the fiber on the beating of cilia and local CSF flow.

      Overall, the results are clearly presented, and methods are thoroughly given, including some indication on the reduction of bias (by blinding movies before analysis). The authors also clearly state the limitations of their work, mostly derived from optical limitation (size of the RF in the larval fish, and speed of the recording in the laser-equipped microscope). This doesn't affect the fundamental statements.

      Thank you again for your appreciation of our work.

      Reviewer #3 (Public Review):

      This manuscript by Bellegarda et al. examined the in vivo dynamic behavior of the Reissner fiber and its interactions with cilia and sensory neurons in the central canal of zebrafish larvae. The authors accomplished this by performing live imaging with a transgenic reporter zebrafish line in which the fiber is GFP-tagged and by finely tracking the movement of the fiber. Interestingly, they discovered that the fiber undergoes a dynamic vibratory-like movement along the dorsoventral axis. The authors then utilized a pulsed laser to precisely cut the fiber, which frequently resulted in a fast retraction behavior and a loss of calcium activity in sensory neurons in the central canal called CSFCNs. Mechanical modeling of the elastic properties of the fiber indicated that the fiber is a soft elastic rod with graded tension along the rostrocaudal axis. Finally, by performing live imaging of motile cilia and the fiber in the central canal, they found that the two interact in close proximity and that cilia motility is affected when the fiber was cut. The authors concluded that the Reissner fiber is a dynamic structure under tension that interacts with sensory neurons and cilia in the central canal.

      Strengths:

      1) The study utilizes state-of-the-art microscopy techniques and beautiful transgenic zebrafish tools to characterize the in vivo behavior of the Reissner fiber and found that it exhibits surprising dynamic movements along the dorsal-ventral axis. This observation has important implications for the physiology and function of the Reissner fiber.

      2) By performing a series of clever laser cutting experiments, the authors reveal that the Reissner fiber is under tension in the central canal of zebrafish. This finding provides direct experimental evidence to support the hypothesis that the Reissner fiber functions in a biomechanical manner during spinal cord development and body axis straightening.

      3) By developing a mechanical model of the Reissner fiber and its retraction behavior, the authors estimate the elastic properties of the fiber and found that it is more akin to an elastic polymer rather than a stiff rod. This is a useful finding that illuminates the biophysical properties of the fiber.

      4) Through calcium and cilia imaging studies, the authors demonstrate that the Reissner fiber likely interacts with motile cilia and regulates the activity of ciliated sensory neurons (CSF-CNs). The authors propose a model in which fiber-cilia interactions may occur via weak interactions or frictional forces. This model is plausible and opens several new doors for additional investigation.

      We thank Reviewer#3 for the support.

      Weaknesses:

      1) All the live imaging experiments appear to be performed with animals paralyzed via the injection of a chemical agent (bungarotoxin). Does paralysis and/or bungarotoxin negatively impact the behavior of the Reissner fiber? Some data from non-paralyzed animals would ameliorate this concern.

      We performed very few experiments on non paralyzed fish as the position of the Reissner fiber were difficult / impossible to analyze in 3D. In a movie added to our revision as Movie 3, it is obvious that skeletal muscle contractions result in very large jumps of the fiber that cannot be corrected for using single plane imaging. Without being able to monitor and correct for muscle contractions, an accurate estimation of the fiber motion in this context would be artefactual.

      2) Although the authors convincingly demonstrate that the Reissner fiber is under graded tension, it remains unclear what is the relevance and function of tension on this structure. The photoablation data presented do not delineate between the relevance of the fiber being intact or tension on the fiber as cutting the fiber impacts both. Is fiber tension required for body straightening? At the site of fiber photoablation, does a spinal curvature develop? If cultured, do the ablated animals exhibit a scoliotic phenotype?

      We thank Reviewer#3 for asking these important questions. We did ask ourselves the same questions, but had to restrain the ambition of our study as for technical reasons, the ablation experiments performed on an inverted microscope required to mount the fish closed to the bottom coverslip and were extremely difficult to perform while safely removing the animal from the imaging cuvette and not affecting the alignment of its body axis.

      3) One of the most potentially impactful conclusions of the paper is that the Reissner fiber interacts with cilia, but the evidence is insufficient to support this. Although some motile cilia are near the fiber (Figure 3A), many cilia are not near the fiber. The provided images and videos do not clearly demonstrate that cilia physically contact or influence the behavior of the Reissner fiber. Further, the data is lacking to conclude that the Reissner fiber directly impacts cilia motility as they did not observe an overall statistically significant difference before and after ablation (Supplemental Figure 1A). Higher magnification, higher resolution, higher acquisition rate and/or colocalization analyses of fiber-cilia interactions could alleviate this concern.

      We agree with the reviewer but could not yet perform for technical reasons more spatially- and temporally- resolutive experiments. Further analysis of cilia and RF translational motion is displayed on the Figure 4 - Supplemental Figure 2 and presented in the Results sections.. We observed that for 7 out of 15 dorsal cilia and 4 out of 9 ventral cilia, the preferred position of the cilium was correlated with a position of the fiber – suggesting that they could interact. However, our current dataset in 2 D is too incomplete to draw strong conclusions on the nature of interactions between fiber and cilia. A future study relying on 3D analysis of the fiber and cilia should resolve how collective interactions of cilia may determine the position of the fiber.

      4) Similarly, how does the Reissner fiber interact with CSF-CN sensory neurons? The authors suggest that the fiber interacts with CSF-CN sensory neurons by modulating their spontaneous calcium activity via weak interactions or frictional forces from motile ciliated ependymal radial glial cells. While the calcium imaging data of the CSF-CNs is convincing and sound, the exact nature of the fiber-neuron interaction is unclear. Do cilia or apical extensions on CSF-CN sensory neurons sense the fiber or forces through a mechanosensing or chemosensing mechanism?

      This question is of great interest to us and will be the topic of a future investigation, as it is very difficult to image CSF-cN motile cilium (see Bohm et al., Nature Comm 2016) and even more with the Reissner fiber.

      There is some additional confusion as the authors appear to focus their cilia experiments on ependymal radial glial cells in section 4, rather than CSF-CNs. The addition of an illustrative cartoon would add clarity.

      We agree and we added a schematic in the last figure (Figure 4A).

      Overall, the conclusions of the study are well supported by the data presented. However, the strength of the conclusions could be enhanced by additional controls, alternative experimental approaches and clarifications.

      This manuscript is an important contribution to the fields of spinal cord development and body axis development, which are fundamental questions in neurobiology, developmental biology, and musculoskeletal biology. In recent years, the Reissner fiber and motile cilia function have been linked to cerebrospinal fluid flow signaling and body straightening, but the precise form and function of the fiber remain unclear. This study provides new insight into the dynamic and biophysical properties of the Reissner fiber in vivo in zebrafish and proposes a model in which the fiber interacts with cilia and sensory neurons. This study provides novel insight into the cellular mechanisms that underlie the pathogenesis of disorders such as idiopathic scoliosis.

      We thank the Reviewer #3 and added further analysis of cilia and RF motion displayed on the figures below added as well as extended data figures in the main manuscript.